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A growing problem 


Without careful stewardship, genetically engineered crops will do little to stop the spread of 


herbicide-resistant weeds. 


It can reach more than 2.5 metres tall, grow more than 6 centi- 
metres a day, produce 600,000 seeds and has a tough, woody 
stem that can wreck farm equipment that tries to uproot it. 

It is also becoming more and more resistant to the popular herbicide 
glyphosate. 

The first such resistant population was confirmed in 2005 ina cotton 
field in Georgia, and the plant now plagues farmers in at least 23 US 
states. It is just one of many resistant weeds marching through the world. 

The US Environmental Protection Agency (EPA) is trying to learn 
from the pigweed experience, and wants to limit the damage caused by 
the latest wave of weed control. It deserves credit and support. 

There is broad agreement that the spread of these resistant plants 
has its roots in the widespread adoption of crops engineered to be 
resistant to glyphosate. By the time these genetically engineered crops 
were released in the mid-1990s, farmers had been battling herbicide- 
resistant weeds for decades. But glyphosate was thought to be a par- 
ticularly challenging herbicide for weeds to overcome. Few cases of 
resistance had been seen. 

That was set to change: by 2012, glyphosate-resistant weeds had 
infested 25 million hectares of US cropland. They have also appeared 
in other countries that have embraced glyphosate-tolerant crops, 
including Australia, Brazil and Argentina. Blanketing crops year after 
year in the same herbicide is the perfect way to foster resistant weeds. 

Chemical companies have come up with a solution: crops engi- 
neered to tolerate multiple herbicides. The likelihood of a weed 
becoming resistant to more than one chemical, they claim, is very 
small. And, in an eerie echo of the 1990s discussion around glyphosate 
tolerance, some even point out that one of the other herbicides being 
targeted — the choline salt of an old chemical called 2,4-D — has been 
used for decades with little sign of resistance. 

It is a flawed argument. Stacking up tolerance traits may delay the 
appearance of resistant weeds, but probably not for long. Weeds are 
wily: farmers have already reported some plants that are resistant 
to more than five herbicides. And with glyphosate-resistant weeds 
already in many fields, the chances of preventing resistance to another 
are dropping. 

Crops resistant to multiple herbicides could be useful. But scien- 
tists are concerned that farmers will rely too heavily on the chemicals, 
and neglect other ways to combat the resistance threat. Those include 
using a mixture of herbicides that are specific to a field’s invaders, 
rotating crops and moderate tilling — practices together known as 
integrated weed management. A farmer making good money in the 
age of biofuel crop subsidies may be loath to switch to a different crop. 
And farmers may be hesitant to invest the money needed to properly 
manage weeds, when their farms could end up infested with weeds 
from less-assiduous neighbours. 

This is where the EPA comes in. In its draft assessment of the 


Pp almer pigweed (Amaranthus palmeri) is not a weed to trifle with. 


blend of herbicides to be used, it calls for the manufacturer — Dow 
AgroSciences of Indianapolis, Indiana — to monitor the emergence of 
resistant weeds and report them to the agency. The EPA will then have 
the power to impose restrictions on Dow or on the use of the herbicide 

if it deems this necessary. 
The EPA is soliciting comments on the draft assessment from the 
public until the end of June. It offers sensible precautions, but it could 
do much more. When an insect-resistant variety of 


“The EPA genetically engineered crop was released, US regu- 
proposes lators required farmers to plant nearby refuges of 
sensible non-resistant plants to ease the selection pressure 
precautions,  oninsects to develop resistance to the crops. Simi- 
but it could lar measures for herbicide-tolerant crops might 
domuch require farmers to rotate crops or herbicides every 
more.” few years — a familiar restriction, because many 


herbicides have limits on how often they can be 
used for environmental reasons. Such measures would be a sign that 
regulators and farmers alike have realized the consequences of under- 
estimating the ability of weeds to develop resistance. m 


Good practice 


Standardized procedures and analyses should 
help to get stem-cell therapies to the clinic. 


that is what generally makes the headlines — and so it is 
with regenerative medicine and stem cells. Media reports 
have left the distinct impression that the research is rather dubious. 

First is the long-standing controversy over the source material: 
human embryos. Research banned by the most powerful man in 
the world — as US President George W. Bush was when he stopped 
federal support for such work in 2001 — must be a bit dodgy, right? 
Then there are the regular reports of companies that are exploiting 
vulnerable — and often seriously ill — patients with promises of 
expensive, but unproven, miracle cures. 

But behind the headlines is a different story. Scientists doing the 
systematic research needed to get cellular therapies into the clinic are 
finally making headway. Trials are now under way for treating an eye 
disorder called macular degeneration using retinal cells. And a trial 
using immature glial cells to treat spinal-cord injury has restarted after 
the company running it pulled out in 2011 (see Nature 510, 18; 2014). 

It has taken many years to get to the starting line, but shortcuts 
are simply not possible, despite charlatan claims. It takes time to 


l nethical procedures, exploitation and inflated promises, 
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learn how to coax stem cells — either from human embryos or 
from reprogrammed adult cells known as induced pluripotent stem 
(iPS) cells — to develop into the right sort of replacement cell. It 
also takes time to work out how to get these cells to integrate into 
the host tissue and to function. And the steps required to work out 
how many replacement cells need to be delivered, and how to deliver 
them safely, cannot be rushed. 

The eye and spinal cord are relatively isolated systems. Much will 
be learnt from them, but the brain and heart are altogether more 
complicated. Fixing damage in these systems is crucial, however, 
because together they provide the biggest disease burden in devel- 
oped countries. 

Happily, clinical trials are on the horizon. Treatments for Par- 
kinson’s disease are just a few years away from clinical testing. And 
some for Huntington’s disease may not be far behind. 

Taking any radical therapy into humans requires caution. Ideally, 
researchers should be able to use data from a patient in one trial to 
refine the approach for one in another. So a decision by the Global 
Force for Parkinson's Disease, or G-force, to bring together teams 
from Europe, the United States and Japan to define standards for cell 
preparation and patient selection and monitoring for future trials 
is particularly welcome (see page 195). 

The G-force seems to have learnt the lessons of moving research to 
the clinic too fast and in isolated teams. Multiple trials of cells derived 
from fetal brains to treat Parkinson's disease began in the late 1980s, 
but stopped in 2003 because the outcomes were an uninterpretable 
mishmash. And trials using adult stem cells to treat heart failure 
have shown wildly varying outcomes (see Nature 509, 15-16; 2014), 


perhaps owing in part to a lack of good preclinical data. But system- 
atic research has now shown that heart cells derived from human 
embryonic stem cells can engraft into damaged primate hearts and 
synchronize their beats to it, at least to some extent. Some of the mon- 
keys developed arrhythmias, showing that the technique still needs 
improvement. The principle of the therapy has been proven, however, 
which gives confidence that clinical trials may become possible. 

Designing trials to agreed standards will 


“News reports ensure that researchers can understand why 
need to be any one patient benefited, or failed to ben- 
carefulnot efit, from the treatment. This will magnify 
to overhype the efficiency of the trials and speed up the 
the potential development of therapies. It is a model that 
of cellular deserves to be widely copied. 


News reports need to be careful not to 
overhype the potential of cellular thera- 
pies. As the field inches towards clinical testing, it is important that 
researchers clearly communicate to the media what the therapies are 
likely to achieve — and what they are not. Early trials are unlikely 
to show cures, but that does not diminish their value: even small 
improvements in quality of life are important to a person with a 
serious disability. A blind person who becomes able to discern light 
from shade, a paralysed person who regains some feeling in a limb 
and a person with advanced Parkinson's disease who can walk inde- 
pendently, if not normally — each will think it worthwhile. 

Like all new therapies, stem-cell repair will improve through 
trial and error. These approaches promise more trial and, hope- 
fully, fewer errors. m 


therapies.” 


Open goal 
International researchers can help to improve 
the scientific enterprise in South America. 


slip a little during the next month, as football fans tune in to 

watch the 2014 FIFA World Cup, which starts in Brazil this 
week. Four years ago, nearly half the world’s population tuned in at 
some point during the tournament. And as the world focuses its atten- 
tion on Brazil, Nature has taken the opportunity to widen the view 
with our special issue on science in South America (see page 201). 
The package of articles and commentaries details some of the success 
stories on the continent as well as the substantial challenges faced by 
researchers there as they seek to build scientific institutions in the 
wake of decades lost to dictatorships. 

They need not struggle alone. From London to Boston to Tokyo, 
individual scientists and larger organizations in the developed 
world can offer significant help to South American countries. 
When Nature asked leading South American scientists what kind 
of assistance would bring tangible benefits, the answers invariably 
clustered around two key requests to their international colleagues: 
host young scientists in your laboratories, and come to visit South 
American researchers. 

The flow of students from South America to the United States and 
Europe has grown in recent years but remains a trickle. Brazil sent 
fewer than 11,000 undergraduate and graduate students to the United 
States in 2013 — less than Turkey and Vietnam, countries with much 
smaller populations and economies. The tally for all students sent to 
US universities from Latin America and the Caribbean was less than 
one-third of the number sent by China. 

Many South American scientists called on their northern col- 
leagues to recruit more graduate students and postdoctoral 


Pp roductivity in offices and labs around the world will probably 
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scientists from the continent. Even short visits of three to six 
months can help to train a young scientist. But the exchanges have 
to be done in a way that does not contribute to the brain drain that 
has lured many leading researchers to permanent positions in the 
United States and Europe (see pages 207 and 213). One solution is to 
provide start-up funds for researchers returning to South America. 
For example, after postdoctoral training in the United States, Lino 
Barafiao received support from the Rockefeller Foundation to estab- 
lish his lab at home in Argentina, where he is now the minister of 
science, technology and innovative production. 

Travel needs to go both ways. According to South American 
researchers, too few scientists visit their continent to spend time in 
labs, give lectures and attend meetings. Even virtual visits, through 
video conferences, would help. 

The networking requests go beyond the wish to trade research 
methods and results. Scientists in South America want to know how 
to select the best people and how to improve coordination between 
universities and industry. Many called for help in improving sci- 
ence-evaluation processes (see page 209). In Brazil, for example, 
assessments too often reward quantity over quality. 

Investments in sending researchers back and forth can yield long- 
term dividends. In 1990, Argentine molecular biologist Eduardo 
Arzt started a fellowship at the Max Planck Institute for Psychiatry 
in Munich, Germany. After returning to Argentina, Arzt continued 
to collaborate with Max Planck colleagues — a connection that 
was key when the society was looking to expand its international 
programs. In 2011, it established its first South American partner 
institute in Buenos Aires, run jointly with Argentina’s Council for 
Scientific and Technological Research, and with Arzt as director. 
Several of the research groups at the institute are led by Argentine 
scientists lured back from overseas by the opportunity to do top- 
tier science. 

Football fans in South America are used to 
seeing top players leave for abroad. Efforts to 
reverse the flow, in science as in sport, face great 
challenges. But they are a worthwhile goal. m 
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using human stem cells derived from embryos — the first such 

regulation in the Arab and Islamic region. I was part of the group 
headed by Abdalla Awidi Abbadi, director of the Cell Therapy Center 
at the University of Jordan in Amman, that initiated the call for the 
law and later drafted it. Stem-cell research is a hot topic for Jordan 
because of the kingdoms status as a health-care hub that draws patients 
from abroad. It is already one of few countries in the Middle East with 
regulations for protecting people who participate in clinical trials. This 
latest law should serve as an example to other countries in the region. 

The newrules ban private companies from using human embryonic 
stem (ES) cells in research or therapies. Such work will be allowed only 
in government organizations or publicly funded academic institutions 
in Jordan, which have higher levels of transpar- 
ency than private firms and are supervised by the 
health ministry anda specialized committee. The 
law also bans payment for donations of stem cells 
and eggs, and says that modified and manipulated 
cells are not to be used for human reproduction. 
There is no current research on human ES cells 
in Jordan; this is a pre-emptive step. 

Much of the controversy and disagreement 
over work on stem cells worldwide arises from the 
different views of the major religions on the earli- 
est stages of life. Although the use of human ES 
cells is opposed by the Roman Catholic Church 
and some Protestant denominations, it is gen- 
erally supported by the Jewish community and 
accepted in many Muslim countries. There is no 
consensus on when human embryonic life begins, 
but the majority of Muslim scholars consider it to 
start 40-120 days after conception and therefore 
hold the view that a fertilized egg up to 5 days old has no soul — it is not 
‘human life’ but ‘biological life. So for many, there is no ethical problem 
in the Islamic faith with using an early embryo to produce stem cells. 

Such conclusions are not easy to reach. Many Muslim countries con- 
sider legislation and bioethics principles to be based on three pillars of 
Islamic law. The first is the Quran. The second is Sunnah, or the legis- 
lative decisions of the Prophet Muhammad. The third is ijmaa — the 
consensus of Muslim scholars — and ijtihad, the concept that every ade- 
quately qualified scholar has the right to independently solve problems. 
On the basis of these pillars, Iran, Saudi Arabia and Tunisia have drawn 
up guidelines on stem-cell research, but they are not legally binding. 

Jordan's stem-cell law is the product of years of discussions by com- 
mittees comprising scientists, physicians, Ara- 


E January, Jordan passed a law to control research and therapy 


bic-language experts, lawyers and Muslimand NATURE.COM 
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ALL OUR 


DISCUSSIONS 
IN JORDAN 


HAVE CONCLUDED 
THAT STEM-CELL 
RESEARCH IS 


PERMISSIBLE 
IN ISLAM. 


Jordan’s stem-cell law can 
guide the Middle East 


A ban on private companies using stem cells from human embryos provides a 
policy framework for other Arab and Islamic countries, says Rana Dajani. 


resolved. We consulted with both the National Committee for Science 
and Technology Ethics and the education ministry. The final law was 
approved by the council of Muslim scholars, the Majlis Al-Iftaa. 

The council agreed with a 2003 decision (fatwa) by Muslim scholars 
that allows the use of human ES cells from permissible sources — 
including legally produced excess fertilized eggs from in vitro fertili- 
zation. The decision to ban private companies from using these cells 
was driven by concerns that the work would encourage termination of 
pregnancies, which is illegal in Jordan unless the mother’s life or health 
is at risk. The council was clear that the new law must forbid human 
reproductive cloning and should not allow embryos to be created from 
the sperm and eggs of unmarried couples. 

The distinction drawn between the various sources of stem cells 
earlier in the discussion process allowed the 
Majlis Al-Iftaa to take a more permissive approach 
to techniques using stem cells that are not derived 
from human embryos. For example, somatic-cell 
nuclear transfer (in which a patient's DNA is trans- 
planted into an unfertilized human egg that has 
no nucleus) and induced pluripotent stem cells, 
which are made from adult cells, can be worked on 
by the private sector under the new rules. 

The therapeutic use of bone-marrow trans- 
plantation — including transplants of blood- 
forming stem cells — is well established in 
Jordan. Such procedures are already regulated 
by existing laws on medical practice, so the new 
law makes a clear distinction between these 
techniques and human ES-cell therapy. 

The legislation not only covers all current 
aspects of stem-cell research and use, but also 
leaves room for later modification. It mandates 
the creation ofa national committee that, among other things, will take 
responsibility for laying out specific regulations for stem-cell banking 
in accordance with international standards. 

All our discussions in Jordan have concluded that stem-cell research 
is permissible in Islam, as long as it is carried out to improve human 
health and takes precautions to respect human life. Still, as the field 
develops, policy-makers must continue to invest in education and 
raise awareness of the opportunities, challenges and uncertainties of 
human ES-cell research. 

The scientific output of the Islamic Arab region is low compared 
with that of other regions. Implementation of these laws in Jordan 
and other Muslim countries could help to encourage research to reach 
international standards and start to bridge that divide. m 


Rana Dajani is associate professor of molecular cell biology at the 
Hashemite University in Zarqa, Jordan. 
e-mail: rdajani@hu.edu.jo 
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Immune cells 
targeted in cancer 


Ina small, early-stage clinical 
trial, an antibody seems to 
slow the growth of tumours 
by decreasing the number of 
cancer-boosting immune cells 
in and near the tumours. 

Some immune cells known 
as macrophages promote 
tumour growth and are 
regulated by a protein, CSF-1, 
and its receptor. Carola 
Ries at Roche in Penzberg, 
Germany, and her colleagues 
produced an antibody that 
blocks this receptor and tested 
it in seven patients with a 
rare cancer of the joints. The 
researchers found that the 
antibody lowered the number 
of macrophages in one patient 
from whom a biopsy was taken, 
and shrank tumours in five of 
the patients. In people with 
other types of tumours, the 
antibody also depleted tumour- 
associated macrophages and 
shifted the ratio of another type 
of immune cell, T cells, towards 
those that fight tumours. 

Targeting macrophages, 
in combination with other 
chemo- or immunotherapies, 
could improve treatment, but 
further testing in humans is 
needed, the authors say. 

Cancer Cell http://doi.org/s3m 
(2014) 


Easy monitoring of 
drug by camera 


By adding a drug-sensing 
molecule to human blood 
samples, researchers can 
measure drug levels witha 
simple digital camera. 
Monitoring drug amounts 
in patients can avoid side 
effects, but the process requires 
specialized resources. Now, Kai 
Johnsson at the Swiss Federal 
Institute of Technology in 
Lausanne and his colleagues 


have used a digital camera 

and software to quantify 

blood levels of a cancer drug 
bound to a specially designed 
bioluminescent sensor protein. 
The sensor, which changes 
from red to blue (pictured) 
with increasing drug levels, can 
be tailored to other drugs, and 
could allow easy, low-cost drug 
monitoring by physicians and 
patients, the authors say. 

Nature Chem. Biol. http://doi.org/ 
s5b (2014) 


Skin sensor 
soothes psoriasis 


A protein in the skin that senses 
environmental signals could be 
enlisted to fight inflammation 
caused by the autoimmune skin 
disease psoriasis. 

Brigitta Stockinger at the 
MRC National Institute for 
Medical Research, London, and 
her team found that altering 
the activity of the protein AhR 
in human skin affected the 
expression of 41 genes that are 
relevant to psoriasis. 

Mice lacking AhR had 
a stronger response to 
imiquimod, a compound 
that causes psoriasis-like 
skin inflammation. However, 
stimulating AhR in normal 
mice reduced imiquimod’s 
effects, suggesting that AhR 
activation may ease psoriasis. 
Immunity http://doi.org/s4m 
(2014) 


PHYSICS 


Another source for 
static electricity 


Physicists have debunked a 
three-decades-old explanation 
for how grains of the same 
material rub together to 
generate static electricity — an 
effect seen, for example, in 
volcanic ash clouds. 

One theory posited that 
because larger grains hold more 
trapped, high-energy electrons, 


RESEARCH HIGHLIGHTS BiiiSaiaa¢ 


SOCIAL SELECTION ‘zeroes 


Papers predict future lab heads 


Scientists at every point on the career spectrum are talking 
about a paper in Current Biology that takes a quantitative view 


of the mantra ‘publish or perish: 


Using a sample of more than 25,000 researchers, Lucas 
Carey at Pompeu Fabra University in Barcelona, Spain, and his 
colleagues developed a statistical model that predicts who will 
eventually become principal investigators. The team found that 
first authors of papers in high-impact journals have the inside 
track, and everyone else is likely to lag behind. Verena Seufert, a 
geographer and PhD candidate at McGill University in Montreal, 
Canada, tweeted that it was a “sad story from a cool paper”. 
Van Dijk, D., Manor, O. & Carey, L. Curr. Biol. 24, R516-R517 (2014) 


Based on data from altmetric.com. 
Altmetric is supported by Macmillan 
Science and Education, which owns 
Nature Publishing Group. 


they redistribute the electrons 
to smaller grains when two 
touch, creating static electricity. 
Heinrich Jaeger from the 
University of Chicago, Illinois, 
and his colleagues measured 
the surface density of trapped 
electrons around different- 
sized grains of zirconium 
dioxide silicate, as well as the 
grains charge. The authors 
found that there are far too few 
trapped electrons to account 
for the observed static build-up 
when the grains are mixed. 

Instead, other charged 
particles, such as ions from 
water films or from the 
surrounding atmosphere, 
could accumulate on the grains’ 
surface and be responsible for 
the effect, the team suggests. 
Phys. Rev. Lett. 112,218001 
(2014) 


Crow brain 
recalls images 


The exceptional cognitive 
abilities of crows could be 
partly due to a structure in their 
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brain that can temporarily 
retain visual information. 

To see whether crows have 
aspects of working memory 
— the ability to remember 
information for future tasks 
— Andreas Nieder and his 
team at the University of 
Tubingen, Germany, trained 
four carrion crows (Corvus 
corone, pictured) ina task 
that required them to recall 
images 1 second after first 
seeing them. During this task, 
the team recorded the activity 
of 662 individual neurons ina 
region of the brain called the 
nidopallium caudolaterale, 
which is thought to correspond 
to the mammalian prefrontal 
cortex — an area involved in 
higher-order thought. 

The neurons seem to encode 
and maintain information 

about the image during this 
time delay, suggesting 
that this brain area is 
involved in the visual 
component of 
working memory. 
J. Neurosci. 34, 
7778 -7786 
(2014) 
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Destination Mars 
To revive the moribund 

US human spaceflight 
programme, NASA should 
plot a course for Mars, says 
the US National Academy of 
Sciences ina report published 
on 4 June. The report outlines 
plans for putting humans on 
Mars sometime between 2037 
and 2050 at a cost of hundreds 
of billions of dollars. It 
criticizes the agency's current 
strategy of seeking to visit an 
asteroid put in orbit around 
the Moon, warning that it 
will end in the “loss of the 
long-standing international 
perception that human space- 
flight is something the United 


States does best”. See go.nature. 


com/wp4zdl for more. 


Forest genetics 


Many countries do not know 
enough about the genetic 
make-up of the trees growing 
in their native forests, finds 

a report from the Food and 
Agricultural Organization 

of the United Nations 

(FAO). The 86 countries that 
contributed to the analysis 
provided genetic information 
for only around 600 species 
out of a maximum of 100,000 
shrubs and trees thought to 
be growing around the globe, 
the report says. In its analysis, 
The State of the World’s Forest 
Genetic Resources, published 
on 3 June, the FAO calls on 
governments to improve data 
gathering and research to help 
manage tree species. 


EI Niiio watch 

The US National Oceanic and 
Atmospheric Administration 
predicts that there is an 

80% chance that an El Nifio 
event — a periodic warming 
of waters in the eastern 
equatorial Pacific Ocean — 
will occur this autumn or 
winter. The forecast, made 
on 5 June, also predicts a 70% 


Nature in close-up 


This microscopic image shows the forest-like arrangement of 
hairs on a gecko’ toe that gives the animal its gravity-defying 
ability to scurry across ceilings. Each foot has hundreds of 
thousands of these hairs, called setae, which fray into smaller 
hairs with split ends called spatulae. The hairs’ strong grip 
has inspired the design of medical adhesives. This image was 
taken by Dennis Kunkel, a photomicrographer in Hawaii. It 
is part of Life: Magnified, an exhibition of scientific images on 
display at Washington Dulles International Airport's Gateway 
Gallery from June to November. Other images include a 
bacterium being swallowed by an immune-system cell and 
chromosomes lining up for cell division. 


chance that an El Nifo will 
occur this summer — up from 
a 50% chance predicted in 
March. The US forecasters 
suggest a moderate-strength 
EI Nifio, which could scramble 
global weather patterns until 
it ebbs. 


FACILITIES 


SOFIA saved 


The world’s biggest flying 
telescope, the Stratospheric 
Observatory for Infrared 
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Astronomy (SOFIA), was 
given anew lease of life on 

5 June. The US Senate voted 

to give US$87 million in 

the 2015 fiscal year to the 
observatory — a modified 
Boeing 747 that carries a 
2.5-metre telescope. The 
funding boost could rescue 
SOFIA; in March, NASA had 
proposed effectively cancelling 
the project because of its high 
operating costs. SOFIA is a 
joint venture with the German 
Aerospace Center and became 
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fully operational in February. 
The Senate and the House of 
Representatives must now 
agree on the budget. 


Telescope pull-out 


Germany is pulling out of the 
world’s largest radio telescope, 
the Square Kilometre Array 
(SKA), scheduled to be 
completed in South Africa 

and Australia by the mid- 
2020s. Germany’s research 
ministry announced the 

move on 5 June, citing a tight 
budget, according to the SKA 
Organisation. The decision 
will take effect on 30 June 2015. 
The pull-out is “disappointing, 
but not catastrophic” for SKAs 
ability to secure funding, 

says Philip Diamond, 
director-general of the SKA 
Organisation near Manchester, 
UK. See go.nature.com/cuacno 
for more. 


Stem-cell patents 
A US federal court threw 

out a legal challenge to a key 
embryonic-stem-cell patent 
on 4 June. The non-profit 
advocacy group Consumer 
Watchdog in Santa Monica, 
California, had argued that 
the patent was invalid because 
the supposed invention was 
merely a product of nature. 
The US Court of Appeals 

for the Federal Circuit ruled 
that Consumer Watchdog 
could not challenge the 
patent, which is owned by the 
Wisconsin Alumni Research 
Foundation in Madison, 
because the group does not 
use embryonic stem cells and 
was not directly harmed by 
the patent. 


Pharma buy-out 

US pharmaceutical giant 
Merck agreed to pay 

US$3.85 billion for Idenix — a 
developer of hepatitis C virus 
therapies in Cambridge, 
Massachusetts. The price tag, 
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agreed on 9 June, was more 
than three times the Idenix 
stock listing at the close of 

trading on the previous day. 


Fossil sentence 

An American fossil dealer was 
sentenced to three months 
imprisonment on 3 June, after 
pleading guilty to smuggling 
dinosaur fossils into the 
United States, including 

a 70-million-year-old 
Tyrannosaurus bataar skeleton 
from Mongolia. Eric Prokopi 
received a reduced sentence 
for helping prosecutors to 
recover more than 17 other 
dinosaur fossils. The stolen 
Tyrannosaurus skeleton, 
which sold at auction for 
more than US$1 million, was 
returned to Mongolia in May 
2013. Mongolia will opena 
dinosaur museum to house the 
returned fossils. 


Chemist dies 
Alexander ‘Sasha’ Shulgin, 

a chemist famed for 
synthesizing and analysing 
novel psychoactive 
compounds, died on 2 June 

at the age of 89. Shulgin 
(pictured) earned his PhD 

in biochemistry from the 
University of California, 
Berkeley, and went on to work 
for US firm Dow Chemical, 
where he developed one of the 
first successful biodegradable 
pesticides, mexacarbate. 


TREND WATCH 


The fatality rate for cases of the 


Middle East respiratory syndrome 


(MERS) coronavirus in Saudi 


Arabia is 41%, not 33%, according 


to figures released on 3 June. 


The country’s health ministry — 
which a day earlier sacked deputy 
health minister Ziad Memish, a 
key figure in the nation’s efforts 

to contain the virus — said that 

it had retrospectively identified 
113 extra cases, and announced 


“new standards” for reporting 


the disease. Of 815 MERS cases 


reported worldwide by 4 June, 
84% were in Saudi Arabia. 


Later he began to develop 
psychedelic drugs and tested 
their activity on himself and 

a small group of friends. 

He published numerous 
academic studies and books 
on the subject, and remained a 
respected scientist throughout 
his career. 


BRAIN plan 


The US National Institutes 
of Health laid out its ten- 
year plan for the Brain 
Research through Advancing 
Innovative Neurotechnologies 
(BRAIN) initiative on 5 June. 
It proposed that Congress 
grant the initiative a further 
US$4.5 billion for 2016-25. 
According to the plan, the 
first five years will be spent 
developing technologies 

to record, analyse and 
manipulate the brain. In 

the following five years, 


researchers will use those 
technologies to study how 
the brain’s circuits lead to 
behaviour and cognition. 


High-seas value 


International waters store 
around 500 million tonnes 
of atmospheric carbon per 
year, providing an ‘ecosystem 
service’ to humans that is 
worth up to US$222 billion 
annually, according to the 
first economic assessment of 
the high seas. The report was 
published on 5 June by the 
Global Ocean Commission, 
a non-governmental 
organization in Oxford, 

UK. It adds that 10 million 
tonnes of fish are caught 
each year in international 
waters, generating more than 
$16 billion. On 24 June, the 
commission will publish 
proposals for protecting the 
ocean. 


Microbead ban 
Illinois became the first US 
state to ban the manufacture 
and sale of personal-care 
products that contain plastic 
microbeads. Environmental 
scientists say that the non- 
biodegradable beads, used as 
exfoliating agents, pass through 
sewage systems and build up in 
waterways, where they absorb 
toxic chemicals and threaten 
aquatic life. A law signed by 
Governor Pat Quinn on 8 June 
prohibits the manufacture 

of soaps, cosmetics and 


SAUDI ARABIA FINDS NEW CORONAVIRUS CASES 


Health ministry reports retrospective discovery of 113 extra cases of 
Middle East respiratory syndrome (MERS) — of whom 92 died. 
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SEVEN DAYS | THIS WEEK | 


17-19 JUNE 

Space scientists meet 
in Chicago, Illinois, for 
the annual research 
conference on the 
International Space 
Station. 
go.nature.com/wnatte 


19 JUNE 

The peak of the 3,000- 
metre Cerro Armazones 
mountain in northern 
Chile will be blown off 
to make a home for the 
European Extremely 
Large Telescope. 
go.nature.com/u6xrrb 


medications containing the 
beads by 2018 and the sale 
of these products by 2019. 
At least four other states are 
considering similar bills. 


Carbon cap 

China could set its first 
absolute cap on fossil-fuel 
emissions from 2016. On 

3 June, international media 
reported that He Jiankun, a 
senior government adviser on 
climate change, told a meeting 
in Beijing that the Chinese 
government may outline 

the cap in its next five-year 
economic plan, for 2016-20. 
He later clarified that the idea 
was his personal view. China 
is the world’s biggest emitter of 
carbon dioxide. 


Development goals 
A United Nations working 
group led by Hungary and 
Kenya has drawn up a list of 
17 sustainable development 
goals and started negotiating 
on them this week. The goals 
will replace the Millennium 
Development Goals, which 
expire next year and include 
targets for eradicating hunger 
and poverty and halting 
biodiversity loss. The United 
Nations aims to finalize the 
new goals next year. 


> NATURE.COM 
For daily news updates see: 
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Defective brain neurons are responsible for the mobility problems seen in people with Parkinson’s disease. 


REGENERATIVE MEDICINE 


Fetal-cell revival 
for Parkinson’s 


Moratorium on controversial therapy lifted as stem cells 
emerge as alternative source of treatment. 


BY ALISON ABBOTT 


neurosurgery team will next month 
Aw cells from aborted human 
fetuses into the brain ofa person with 
Parkinson's disease. The operation breaks a 
decade-long international moratorium on the 
controversial therapy that was imposed after 
many patients failed to benefit and no one 
could work out why. 
But the trial comes just as other sources of 


replacement cells derived from human stem 
cells are rapidly approaching the clinic. And 
this time, scientists want to make sure that 
things go better. So the teams involved in all 
the planned trials have formed a working 
group to standardize their research and clini- 
cal protocols in the hope that their results will 
be more easily interpretable. 

People with Parkinson's disease suffer from 
a degeneration of neurons that produce the 
neurotransmitter dopamine, which is crucial for 
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normal movement. This often leaves patients 
with severe mobility problems. Standard treat- 
ment includes the drug L-dopa, which replaces 
dopamine in the brain but can cause side 
effects. The cellular therapies aim to replace the 
missing neurons with dopamine-producing 
(dopaminergic) cells from fetal brains or with 
those derived from human stem cells. 

The moratorium on replacement-therapy 
trials was introduced in 2003 because the early 
fetal-cell studies had produced varying results 
that were impossible to interpret. 

“We want to avoid a repeat of this situation? 
says neurologist Roger Barker at the University 
of Cambridge, UK, who helped to organize the 
working group’s inaugural meeting in London 
last month. The group, known as the Parkin- 
son's Disease Global Force, includes scientists 
from the European, US and Japanese teams 
about to embark on the trials. At the meeting, 
they pledged to share their knowledge and 
experiences. 

The first human transplantation of fetal 
brain cells took place in 1987 at Lund Uni- 
versity in Sweden, where the technique was 
pioneered. Surgical teams took immature fetal 
cells destined to become dopaminergic neu- 
rons from the midbrain of aborted fetuses and 
transplanted them into the striatum of patients’ 
brains, the area of greatest dopamine loss in 
Parkinson's disease. 

More than 100 patients worldwide received 
the therapy as part of clinical trials before the 
moratorium. “But centres used different pro- 
cedures and protocols — it was impossible to 
work out why some patients did very well and 
others didn’t benefit at all” says Barker. 

In 2006, Barker, together with neuroscien- 
tist Anders Bjorklund at Lund University, set 
up a network to bring together the original 
seven teams that had performed the trans- 
plants, to assess all protocol details and patient 
data retrospectively. 

The teams worked out that the procedure 
tended to be most effective in patients who 
were relatively young and whose disease was at 
an early stage. In addition, post-mortem analy- 
sis of patients’ brains showed that those who 
benefited most had at least 100,000 dopamine- 
producing cells of fetal origin integrated into 
their brains. Cells from at least three fetuses are 
needed to achieve these numbers, the neuro- 
scientists concluded. 

The retrospective analysis encouraged the 
European scientists, including Barker and > 
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Bjorklund, to launch a new trial, which 
is funded by the European Union, involv- 
ing fetal dopaminergic-neuron transplants. 
Known as TRANSEURO, it will monitor 
disease progression in 150 patients in the 
United Kingdom, Sweden, France and 
Germany. The first patient is due for trans- 
plantation next month at Addenbrooke's 
Hospital in Cambridge. In line with the 
retrospective findings, the average age of 
trial participants at recruitment was 55, and 
their average disease duration just 4 years. 
None had displayed dyskinesias — uncon- 
trolled muscle movements that can be a side 
effect of L-dopa treatment. 

But stem-cell biology has advanced 
significantly since 2003, and dopaminergic 
neurons can now be derived from human 
embryonic stem cells and also from induced 
pluripotent stem cells — mature cells that 
have been rewound to an uncommitted 
stem-cell-like state and that can be coaxed 
to become a cell type of choice. These poten- 
tial sources are more desirable than those 
derived from fetuses, because fetal cells are 
hard to come by and their biology varies. 

Research is under way to ensure that 
the stem cells develop into the exact type 
of dopaminergic cell needed to treat 
Parkinson's and that they become correctly 
integrated into recipients’ brains. But pro- 
gress has been so fast that clinical trials are 
already on the horizon. A Japanese trial, 
using induced pluripotent stem cells, is 
planned to start in Kyoto within two years; 
and two trials using human embryonic 
stem cells are also planned, one to begin 
within three years in New York and the 
other in Europe within four to five years. 

The Parkinson's Disease Global Force 
hopes that its joint planning will make 
comparing outcomes easier. Members will 
share their protocols for deriving and graft- 
ing cells, as well as their clinical criteria for 
patient selection and follow-up. 

They see the TRANSEURO trial as a 
pathfinder. “We don't know yet which 
source of cell will turn out to be the best, but 
right now the fetal cell is the gold standard 
we need to match,’ says neurologist Claire 
Henchcliffe from the Weill Cornell Medical 
Center in New York, who is coordinating 
the working group’s guidelines on patient 
assessment and trial design. 

The stem-cell approaches have a long 
way to go before they can rival the promise 
of fetal cells, says Lund University stem- 
cell biologist Malin Parmar, a member 
of the European clinical-trial team. That 
is because the cells from fetal brains are 
already on the way to becoming mature 
dopaminergic cells. “The human body 
knows very well how to develop each 
cell type from the embryo,” she says. “We 
haven't learnt all of these secrets yet, but we 
have learnt some major ones.” = 
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Space-station 
science ramps up 


NASA pushes research agenda in face of Russian resistance. 


BY ALEXANDRA WITZE 


extending International Space Station 

(ISS) operations until 2024, the world 
was a very different place. That was before 
Russian military intervention in Ukraine, 
before US-Russian relations foundered 
and before Russian deputy prime minister 
Dmitry Rogozin suggested that US astronauts 
use a trampoline to get themselves to orbit (see 
Nature http://doi.org/s4f; 2014). 

Rogozin also suggested last month that 
Russia would stop participating in the space- 
station programme after its scheduled end 
date of 2020. That statement did not set official 
government policy, but given Russia's key role 
in the orbiting outpost it cast a shadow over 
hopes for the four-year extension. 

With the clock ticking, the race is on 
to conduct as much science as possible in 
whatever time the space station has left. At 
a conference next week in Chicago, Illinois, 
NASA scientists will try to lure researchers 
who have not worked with near-zero-gravity 
conditions before. The goal is to get them to 


iF January, when the United States proposed 
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propose anything from the usual research 
agenda — such as protein crystallization and 
human physiology experiments — to basic 
biomedical research and Earth-science obser- 
vations that can take advantage of the high- 
flying platform before its mission ends (see 
‘Research push). “There's never been anything 
like it,” says Julie Robinson, NASA’ space- 
station research chief at the Johnson Space 
Center in Houston, Texas. “It’s like a university, 
all together with all the disciplines — I don't 
know if we'll see that again.” 

More than 1,600 scientists from 69 coun- 
tries have contributed to experiments carried 
out on the space station since its first module 
was launched in 1998. The United States is the 
largest science user. But over the years, many 
have questioned the value of the science done 
in orbit. One main goal is to help humans to 
endure long-duration spaceflight, but early 
experiments often failed. For instance, NASA 
astronauts would spend hours a day exer- 
cising on treadmills to slow down muscle 
wasting and bone loss — to little avail. Force 
measurements revealed that they were sub- 
jecting their bodies to stresses that were not 


NASA 


NASA 


ANNTHEA LEWIS 


RESEARCH PUSH 


The United States has proposed 
extending the International Space 
Station’s lifetime by four years, to 2024, 
to conduct more scientific research. 
This year, the station will see the arrival 
of multiple science payloads. 


el A Launch of Orbital Sciences 
cargo ship, which includes student 
experiments. 


re Launch of Automated 
Transfer Vehicle cargo ship and 
experiment supplies. 


PUTER Launch of SpaceX ship, 


including rodent habitat and RapidScat 
ocean-wind monitor. 


a 4a Launch of SpaceX 


cargo ship with cloud-aerosol laser 
instrument. 


even close to the pull of gravity on Earth. 
Another goal has been to conduct basic 
scientific observations to see how physical 
and biological processes change in a near- 
weightless environment. But these studies 
have often been limited to growing relatively 
unimportant proteins or running student 
experiments, and have often not made fun- 
damental breakthroughs. In a sharply critical 
2011 report, a US National Research Council 
panel argued that NASA was “poorly posi- 
tioned to take full advantage of the scien- 
tific opportunities offered by the now fully 
equipped and staffed ISS laboratory”. 
Space-station research suffered further 
after NASA halted the space-shuttle pro- 
gramme in 2011, eliminating the only option 
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for both delivering experiments to orbit and 
returning them to Earth. 

But in late 2012, the debut of two-way cargo 
ships made by private company SpaceX of 
Hawthorne, California, again allowed sam- 
ples to be returned to Earth. That develop- 
ment enabled experiments such as those 
of David Klaus, an aerospace engineer and 
gravitational biology researcher at the Uni- 
versity of Colorado Boulder, who is explor- 
ing why antibiotics seem to work less well in 
space than on the ground. The answers might 
help in making the drugs perform better on 
Earth, he says — and running operations to 
2024 would allow him more generations of 
experiments, with the results of each guiding 
the design of the next. 

“Extending the station out that long helps us 
align a bit better with research portfolios here 
on the ground,’ says Duane Ratliff, chief oper- 
ating officer at the Center for the Advance- 
ment of Science in Space, an organization in 
Melbourne, Florida, that manages US space- 
station research for NASA. 

Another strategy is to mount Earth-science 
experiments on the space station as a cheaper 
alternative to putting them on a free-flying 
satellite. Some time in August or later, an 
instrument that monitors ocean winds will be 
flown to the station to replace a satellite-borne 
one that failed in 2009. And a supply run in 
September is slated to deliver a laser system 
to measure clouds, dust and pollution in the 
Earth’s atmosphere. 

The agency is also adding what it hopes 
will be versatile facilities that produce data 
for a broad range of research. The August 
supply run will carry a rodent habitat — 
the largest ever launched for long-duration 
spaceflight, says Robinson, with a capacity of 
40 mice. And a series of experiments called 
geneLAB will send a range of model organ- 
isms, including fruit flies and nematodes, 
into space for months at a time, performing 
basic biomedical assays on them both in orbit 
and after returning them to Earth. The accu- 
mulated information will go into a database 
that any ground-based researcher will be able 
to draw on. 

“People will come and do one to two experi- 
ments in space,” says Robinson, “and continue 
to do work in their lab for another 30 years to 
understand that insight.” m 
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Demonstrators protest against gene patenting outside the US Supreme Court last April. 


Cancer-gene data 
sharing boosted 


Efforts to get more breast-cancer gene variants into public 


databases are gaining ground. 


BY ERIKA CHECK HAYDEN 


hen the US diagnostics giant Myr- 
iad Genetics had its legal monop- 
oly on breast-cancer gene testing 


eliminated one year ago, the company still 
retained an enormous edge over competitors. 
Although the US Supreme Court'’s ruling last 
June invalidated the patenting of genes, and 
with it Myriad’s exclusive rights on two genes 
associated with breast- and ovarian-cancer risk, 
the firm still has a private trove of data from 
1.3 million genetic tests. 

That information gives Myriad, of Salt 
Lake City, Utah, an advantage in interpret- 
ing test results on these genes. 

But a coalition of scientists, physicians, 
patients and genetic counsellors says that it 
will soon eliminate that advantage. A year 
after the Supreme Court invalidated the 
patenting of genes — and with it, Myriad’s 
monopoly on testing for mutations in the 
BRCAI and BRCA2 genes linked to breast 
and ovarian cancer — the number of entries 
for BRCA variants in ClinVar, a public data- 
base for clinical genetic data, has grown 
to around one-third of the number in the 
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Myriad database. Leaders of the public effort 
say that it is a showcase for how scientists 
can clear long-standing obstacles to sharing 
genetic information. 

“My guess is that within a year’s time, we 
would have a more robust data source for 
BRCA1 and BRCA2 than Myriad has by itself 
today,” says geneticist Heidi Rehm, head of 
the Laboratory for Molecular Medicine at the 
non-profit company Partners HealthCare in 
Cambridge, Massachusetts, and a central 
figure in the BRCA data-sharing drive. 


PUBLIC VERSUS PRIVATE 
Myriad counters that public databases are 
unreliable, expensive and vulnerable to fund- 
ing cuts that compromise their upkeep. “We 
have the highest-quality databases in the 
world,” says company spokesman Ronald 
Rogers. “And that’s important because when 
the patient is given a result, they're going to 
make a medical management decision based 
on that information.” Rogers says that the firm 
spent US$500 million to develop its tests and 
database. 

In genetic testing, the size of the reference 
database matters: the bigger it is, the more 
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useful it becomes for interpreting the results > 
of any individual gene test. Of the thousands of & 
possible variations on the spelling of the DNA 
bases that make up the BRCA genes, only some 
will be linked to cancer. Ifa particular variant is 
significantly more frequent among those who 
develop cancer than in the general population, 
its contribution to disease risk can be calcu- 
lated. Only by collecting data from many dif- 
ferent people can scientists observe the same 
variants often enough to make these kinds of 
calculations with confidence. 

“If Myriad holds the data hostage in a pro- 
prietary database, they're harming patients,” 
says Sherri Bale, managing director of 
GeneDx in Gaithersburg, Maryland, one of 
several competitors being sued by Myriad for 
infringement of other patents associated with 
the BRCA tests. These companies have cut the 
price of BRCA tests to an average of $2,200 
compared with Myriad’s $4,040 — and are con- 
tributing an accelerating flow of BRCA data to 
ClinVar, which is held at the National Center 
for Biotechnology Information in Bethesda, 
Maryland. 

“All of these companies seem much more 
willing to share data,’ says genetic epidemiolo- 
gist David Goldgar of the University of Utah’s 
Huntsman Cancer Institute in Salt Lake City. 

Geneticists such as Rehm and Robert Nuss- 
baum at the University of California, San 
Francisco, are urging patients, genetics profes- 
sionals and insurance companies to use only 
BRCA-testing companies that share data. 

Geneticists have long exhorted their col- 
leagues to share, but have been blocked by 
practical and competitive barriers. Sharing 
takes time and money, and geneticists fear 
compromising patient privacy or seeing com- 
petitors beat them to market or publication 
with results based on their own data. An initia- 
tive started by Nussbaum — the Sharing Clini- 
cal Reports Project — aims to bring abouta sea 
change by coaxing clinical labs to deposit the 
results of Myriad diagnostic tests into Clin Var. 

ClinVar now holds information on 
5,752 BRCA variants, deposited by Sharing 
Clinical Reports, Myriad competitors and aca- 
demic labs, compared with Myriad’s 16,000. 
And other, broader, plans for data sharing 
are afoot. At a meeting in London in March, 
geneticists and physicians convened by the 
non-profit Global Alliance for Genomics and 
Health, based in Toronto, Canada, which pro- 
motes data sharing, outlined a plan to create an 
even more extensive data resource. Dubbed the 
BRCA Challenge, it would link major clinical- 
genetics databases such as ClinVar and the 
Leiden Open Variation Database, funded by 
the European Union and run by a team in the 
Netherlands. 

“The thought,’ Goldgar says, “is that the rest 
of the world has roughly the same amount of 
data as Myriad — and if we put it all together 
and make a concerted effort to try to use it, 
then we can be on equal footing.” = 
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Sound clue in hunt for MH370 


Hydroacoustic signal caught by sensors in the Indian Ocean may be linked to crash 


of Malaysian airliner. 


BY DECLAN BUTLER 


esearchers are exploring what may be 
R the first promising lead in months in 

the search for the wreckage of Malay- 
sian Airlines flight MH370. Sensitive micro- 
phones on the ocean floor off Australia picked 
up a distinctive signal at about the time that 
the Boeing 777 aeroplane is believed to have 
crashed in the Indian Ocean. The findings, 
announced by Australian scientists last week, 
offer a rough location for the source of the 
sound and are being followed up by search 
authorities. 

The signal was discovered by a team led 
by Alec Duncan, an underwater acoustics 
specialist at Curtin University’s Centre for 
Marine Science and Technology in Perth, 
Australia. It was recorded at 01:30 coordi- 
nated universal time (UTC) on 8 March; the 
last transmission from flight MH370, an elec- 
tronic ‘handshake’ between the aircraft’s com- 
munication system and a satellite, took place 
at 00:19 UTC, when the plane is estimated to 
have run out of fuel. 

The sound is believed to have originated 
somewhere along a strip running to the north- 
west of the Indian Ocean (see ‘Sound track- 
ers’). That is out of the range of the current 
search, which was determined by analysis of 
the satellite communication data and is being 
led by the Australian Transport Safety Bureau 
(ATSB). However, the techniques used are 
well-established. “The ATSB will continue to 
discuss the analysis of this information with 
Curtin University for the purposes of inform- 
ing the search,” says a spokesperson for the 
Joint Agency Coordination Centre (JACC) in 
Canberra, which is coordinating the Austral - 
ian government’s support for the search. 

The Curtin team emphasize that the sound 
may have come from other sources, such a 
small earthquake, but thinks that the lead is 
worth pursuing. It is now preparing to retrieve 
more hydroacoustic data from the ocean off 
northwestern Australia. 

Duncan’s team found the signal while 
analysing data from an acoustic station in Perth 
Canyon about 40 kilometres west of Rottnest 
Island near Perth. It is one of six stations oper- 
ated by Australia’s Integrated Marine Observing 
System (IMOS), which was set up to make phys- 
ical, chemical and biological observations of the 
ocean basin. Duncan then confirmed the sig- 
nal using data from an acoustic station off Cape 


Hydrophones run by the Comprehensive Nuclear-Test-Ban Treaty Organization track explosions in the sea. 


Leeuwin on the southwest tip of Australia that 
were provided by the Comprehensive Nuclear- 
Test-Ban Treaty Organization (CTBTO). 

The CTBTO, a nuclear-test monitoring body 
based in Vienna, maintains a global network of 
seismic and radioiso- 


tope detectors, as well “Underwater 

as other instruments. acoustic data 
The networkincudes _ still has the 

six hydrophone sta-__ possibility 

tions that monitor of adding 

for explosionsin the  sgmething to the 
ocean, but canalso gearch.” 


pick up other sounds 

such as whale calls at great distances. Cape 
Leeuwin is one of two CTBTO acoustic stations 
in the Indian Ocean. Duncan also analysed 
data from the other, off Diego Garcia island 
in the middle of the Indian Ocean, but found 
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nothing. The data were too polluted by noises 
from seismic surveys, he says. 

The IMOS stations have just one microphone 
each, so alone cannot provide detailed infor- 
mation on the direction of sounds. But the 
CTBTO’s stations have two sets of three hydro- 
phones separated by several kilometres, which 
— like a pair of human ears — allow listeners 
to get a fix ona sound's direction to within 0.5°. 

Duncan and his colleagues now plan to 
recover and analyse data from the two IMOS 
stations off northwestern Australia. But those 
loggers record only 5 minutes of sound every 
15 minutes, and any signals are likely to be 
contaminated with noise from seismic surveys, 
says Duncan. He reckons that there is only a 
“slight chance” that their data will contain the 
signal, but that it is “worth a go”. The team had 
planned to recover the sensors in September or 
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SOURCE: CURTIN UNIVERSITY/ATSB 
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October, but now hopes to make the 7-9-day 
round trip in August. “Given the continuing 
uncertainty regarding the fate of MH370, 
underwater acoustic data still has the possi- 
bility of adding something to the search,’ says 
Mark Prior, a CTBTO seismic-acoustic officer. 

Meanwhile, it is unclear what other sources 
of hydrophone data that could be used in 


SOUND TRACKERS 


The sound of the possible impact of flight MH370 was detected é 
from acoustic stations at Perth Canyon and Cape Leeuwin. Data i 
from stations at Scott Reef and Dampier are now being a to a 
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Seabed hydroacoustic stations: 

e CTBTO (Comprehensive Nuclear- 
Test-Ban Treaty Organization) 

e IMOS (Australia's Integrated Marine 
Observing System) 
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the search exist in the region. The US Navy 
deployed vast arrays of hydrophones on the 
ocean floor during the cold war for anti- 
submarine warfare. Details of the Sound Sur- 
veillance System (SOSUS) remain secret, but 
most of the hydrophones are thought to have 
been deployed off the US Atlantic and Pacific 
coasts. 
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Once the cold war ended, however, the 
system was downgraded. The data are still 
being collected, but they are not routinely 
analysed unless there is an underwater threat. 
William Marks, a spokesman for the US Navy 
in Yokosuka, Japan, declined to comment on 
whether the United States had hydrophones in 
the region. “Discussions of the SOSUS system 
at that level are classified,” he says. “This is a 
very sensitive system.” 

India and Pakistan also have submarine 
fleets, but Duncan and other scientists say that 
they do not know whether they or any other 
nation has hydrophones in the Indian Ocean. 
“We have not been advised of any hydrophone 
facilities operated by India or Pakistan,” says 
the JACC spokesperson. m 


CORRECTIONS 

The News story ‘Phage therapy gets 
revitalized’ (Nature 510, 15-16; 2014) 
mischaracterized the CRISPR mechanism 
for tackling antibiotic-resistant microbes. It 
should have said that the phage injects DNA 
into the bacterium, which then transcribes 
it into RNA. And in the News story ‘Chicken 
project gets off the ground’ (Nature 509, 
546; 2014), the mentions of ‘guinea fowl’ 
should have read ‘jungle fowl’. 
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STARS OF 
SOUTH 
AMERICAN 
SCIENCE 


Growing resources for research.and development are 
creating opportunities across the continent, but many 
countries still struggle to build their programmes. 


dark. Brazil is the only country on the 

continent that spends more than 1% of 
its gross domestic product on research and 
development, and even that investment sits far 
below what other countries of similar means 
are ploughing into science. 

But take a closer look at the continent's 
scientific enterprise, and bright spots emerge. At 
the start of the FIFA World Cup in Brazil, with 
billions of people focusing on South America, 
Nature examines a part of the world that has 
spent too long on the sidelines of science. 

A graphic tour on page 202 details the inputs 
and products of research and development on 
the continent. The region faces many challenges 
in terms of building a strong scientific work- 
force and boosting resources, but investment 
and publications are climbing. A News Feature 
on page 204 profiles several key institutions and 
research groups — from agricultural specialists 
in Colombia to RNA experts in Argentina — 
who have gained worldwide recognition. 

An Editorial on page 188 calls on inter- 
national colleagues to help build South Ameri- 
can science in ways that do not cause young 
researchers to leave permanently. On page 213, 
a Comment describes one such success: the 
Pew Latin American Fellows Program, which 
each year sends about ten top graduates to 
work in North American labs. More than 70% 
return to their native countries, bringing with 
them the expertise they have gained. That 
initiative is a part of broader efforts, described 
ina News Feature on page 207 that examines 
how countries are trying to repatriate scientists 
who left to train abroad. 

As economies on the continent heat up, they 
are devoting greater resources to research, 
increasing the need for better infrastructure 
and policies to support science. Ina Comment 
on page 209, research leaders describe how they 
hope to navigate this growth, and how science 
can help to expand their countries’ economies 
sustainably. Ideas range from creating a science 
ministry to using research to find new commer- 
cial uses for the fruits of the Amazon. 

Many researchers in South America maintain 
a cautious outlook — they have lived through 
periods of intense economic and political 
strife in the not-too-distant past. But they also 
harbour the hope that the continent's science is 
headed for a winning season. = 


SOUTH AMERICAN SCIENCE 


A Nature special issue 


ike the night sky, the overall sweep of 
science in South America can look pretty 
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S I TH A Mi F y CA THE PUBLISHING LANDSCAPE 


— but at 4%, it still underperforms 
slightly relative to its 5-6% share of 
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‘ NUMBER OF ARTICLES 
Fy Richard Van Noorden PUBLISHED IN ELSEVIER’S 


CITATION DATABASE 


he expanding economies of South pace with rising output, and the continent's SCOPUS IN 2013 
America have led to a significant research papers still struggle to attract Spas Here 
rise in scientific output over the past citations from the rest of the world. There 
two decades, and research spending has are huge inequalities across the region, too: 
increased in most countries. But given the Brazil dominates the publication record, for 
region's share of the world’s population and example, whereas Chile takes pole position BRAZIL: 46,306 
gross domestic product (GDP), publication in the patent landscape and Argentina scores In the past 20 years, Brazil’s 
rates still fall short of what would be highly in terms of the proportion of its reel eu et Eo ts 
expected. Research quality has not kept population working in science. economy has almost tripled in 


terms of purchasing power. The 


country now accounts for more 
than two-thirds of South 


America’s entire research output 
— although it is broadly similar 

to Argentina, Uruguay and Chile 
in terms of articles per capita. 


VENEZUELA: 1,319 

The only South American 
nation whose scientific 
output is declining: its 
publication tally fell by 29% 
between 2009 and 2013. 


peru: 1,044 


Nearly three-quarters of Peru’s 
articles involve collaborations 
with other countries. The 
most-cited articles include 
work on prevention of HIV, 
tuberculosis and lupus. 


ARGENTINA: 9,337 
Has hauled up the 
impact of its research 
to just above the 
world’s average — 
South American share of world publications (%) outperforming Brazil. 


4 


Dg 
CHILE: 6,794 
As well as its astronomical 
SRR sincere nya vermin rerTer nae crac cu iorea Ren pr men era Manner ane ee observatories, the country 


has also found scientific 
success working on food 
crops, such as a highly 
cited collaboration on the 
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The hidden continent 

South America’s research strength may be papers that Brazil published in SciELO (Scientific 
underestimated because its researchers often Electronic Library Online), a subsidized collection 
publish in journals that are not indexed in major of mainly Latin American journals, were not 
citation databases, such as Elsevier’s Scopus or indexed in Thomson Reuter’s database. But last 
Thomson Reuter’s Science Citation Index. In 2012, year, Thomson Reuters agreed to create a 

for example, some 6,000 of the roughly 20,000 database for the SciELO index. 
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COLLABORATION AND EXCELLENCE 


South America’s scholarly impact remains relatively low — its citation rate 
last year was around 80% of the world’s average (below). Peru’s articles do 
best, largely because most are co-authored with scientists outside the 
continent. Indeed, the region’s less-developed countries are generally more 
likely to collaborate beyond South America. In Brazil, less than one-quarter 
of its articles in 2008-12 involved such partnerships (right). 


Citation impact weighted by research field (1 = world average) 


te = South America average 


1996 1998 2000 2002 2004 2006 2008 2010 2012 


RESEARCH STRENGTH 


Brazil has more than 100,000 full-time researchers, single-handedly 
providing nearly two-thirds of South America’s science personnel. 
But Argentina has the greatest proportion of researchers, with almost 
3 scientists for every 1,000 workers. 


Full-time equivalent researchers per 1,000 labour force 


United States 
China 
Argentina 
Brazil 
Uruguay 
Chile 
Venezuela 
Bolivia 
Colombia 
Ecuador 
Paraguay 


RESEARCH SPENDING 


Argentina and Brazil’s spending on research and development 
(R&D) has shot up even faster than their economies have 
grown. Brazil remains the region’s only country to devote 
more than 1% of its economy to R&D*. 


In 2011, US 


spending was 
2.8% of GDP. 


Expenditure on R&D as a percentage of GDP 


Brazil 


Bolivia Uruguay 


Colombia 
Paraguay 
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*No verified figures for Venezuela, no up-to-date data for Peru. Data are incomplete for Ecuador and Chile. 
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Collaborations involving other 
South American nations 
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Other international 
collaborations 


No international 
collaboration 


Brazil and Argentina are 
central to co-authorship 
networks within South 
America, and the United 
States is the top international 
collaborator for every nation. 
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In Brazil, nearly half of research funding comes from the business sector; in other South 
American nations, the share from businesses is generally much lower, a stark contrast 
with many industrialized countries. Poor private investment results in a small number of 
patents granted per capita, where South American countries look particularly weak. 
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*As recorded by the World Intellectual 
Property Organization 


Domestic and foreign 
patents granted in 2012, 
per 1 million people* 


According to the World 
Bank, economic indicators 


suggest that Brazil should 
have registered 50% more 
patents with the US patent 
office than it actually did in 


2006-10. 


NATURE.COM 


For more on 
South American 
science see: 
nature.com/ 

5 southamerica 


Peru: 0.93 


Chile: 13.52 
Argentina: 8.62 
Brazil: 5.17 
Colombia: Eyal 
ql Paraguay: 0.45 
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Despite myriad problems in many 
countries, pockets of excellence 
thrive in South American science. 
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but science in Brazil beats the World Cup — at least in a financial 


match-up. Government and businesses there invest some US$27 bil- 


lion annually in science, technology and innovation, dwarfing the price 


tag for the football tournament, which tops out at about $15 billion. 


Science in Brazil and many other countries in South America has 


come a long way since the dark days of the dictatorships just a gen- 


eration ago. In Argentina, the number of science doctorates jumped 


nearly tenfold between 2000 and 2010; Peruvian scientists tripled 


the tally of articles they produced over the same period; and science 


funding is climbing in most countries. 


South American science still has far to go if it hopes to catch up 


with other continents. By many measures — such as investments, 


patents and education — the countries there lag behind other nations 


with similar levels of gross domestic product (GDP). There is loom- 


ing instability in countries such as Argentina and Brazil, where recent 
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protests reflect deep social and economic divisions — problems that 
plague much of South America. But amid the concerns, there are many 
bright spots in the world of science. Here, Nature highlights several 


examples of outstanding researchers and institutions in the region. 


CHILE 


UPWARD 
TRAJECTORY 


BY MICHELE CATANZARO 


hen Mario Hamuy finished his uni- 

versity degree in Chile in 1982, he was 

the only one in the country interested 
in pursuing graduate studies in astronomy. 
Now, more than 25 Chilean students join such 
programmes each year and Hamuy directs 
the Millennium Institute of Astrophysics in 
Santiago, home to 95 students and faculty 
members. 

During the course of Hamuy’s career, Chile 
has emerged as a major player in the world 
of international astronomy, in no small part 
because of the extraordinary collection of 
telescopes housed in the country’s highlands. 
“Astrophysics has 
come to the forefront 
of Chilean science 
thanks to the increase 
in human resources 
and to the fact that 


The European 
Southern Observatory 
operates the Very 
Large Telescope in 
northern Chile. 


B. TAFRESHI/ESO 


CDFM/FAPESP. 


BRAZIL 


SAO PAULO’S HEAVY HITTER 


BY GIULIANA MIRANDA 


of the leading research in South America’s 
largest country emanates from an area 
the size of the United Kingdom. Sao Paulo, in 
southern Brazil, is the richest of the country’s 
26 states and publishes more than half of Bra- 
zil’s scientific articles. One of the main reasons 


Aer Brazil rivals Europe in size, much 


we have the cleanest sky in the world, says 
Dante Minniti, an astronomer at the Pontifi- 
cal Catholic University of Chile in Santiago. 

Although Chile invested just 0.44% of its 
GDP in scientific research in 2011, the latest 
year for which figures are available, funding 
for astrophysics has steadily grown, from 
$2 million in 2006 to $6.8 million in 2010. 
Over the same period, the number of faculty 
positions has almost doubled. And the coun- 
try’s publications in astronomy have risen 
more than fourfold during the past decade. 

The quality of the work has improved as 
well. Chile ranks highly in terms of citations 
per paper in space science, and some of its 
scientists have made important discoveries. 
In the early 1990s, Hamuy made a key con- 
tribution that helped others to measure the 
accelerating expansion of the Universe and 
wina Nobel Prize in 2011. And Minniti is one 
of the leaders at the VISTA infrared survey 
telescope at the European Space Organi- 
zation’s Paranal Observatory in northern 
Chile, which has created a catalogue of more 
than 84 million stars in the central parts of 
the Milky Way. 

Chile’s skies have been attracting interna- 
tional telescopes since 1964. By 2020, when 
the European Extremely Large Telescope is 
due to be completed, the country is expected 
to host 70% of the global observation surface 
for large optical and infrared telescopes. 

By contract, Chilean astronomers get 10% 
of the observation time on each telescope 


for its success is the SAo Paulo Research Founda- 
tion (FAPESP), the state agency that promotes 
research and education. In 2013, the agency 
invested $512 million in science funding, more 
than many nations in the region. (At the federal 
level, Brazil's National Council for Scientific and 
Technological Development has a budget of 
about $650 million for science, technology, and 
innovation in 2014.) 


installed in the country. But some astrono- 
mers say that this is too little, considering how 
much the country provides for the organiza- 
tions running the telescopes. 

“This country has given enormous advan- 
tages to the international consortia, ranging 
from full tax exemption to diplomatic status: 
it’s time that Chile participates in a more 
active way, says Monica Rubio, director of 
the astronomy programme of the Chilean 
funding agency CONICYT. 

A unanimous aspiration of Chilean scien- 
tists, says Rubio, is not just to use observa- 
tories but also to build them, through local 
companies and engineers. Another plan 
Rubio is working on is developing the Ata- 
cama Astronomical Park, a 36,347-hectare 
protected area around the Atacama Large 
Millimeter/submillimeter Array, which 
CONICYT plans to use to attract future tel- 
escopes from Brazil and the United States, 
and maybe also from China, South Korea 
and Thailand. 

But many astronomers are worried about 
the governance of science in Chile. CONI- 
CYT has lacked a director since José Miguel 
Aguilera resigned eight months ago, and the 
country’s new president, Michelle Bachelet, 
has frozen plans to create a science minis- 
try (see Nature 507, 412-413; 2014). “It's a 
good moment for Chilean astronomy, but 
keeping the momentum will require more 
sustained support from the government,” 
says Minniti. = 
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Created in 1960, FAPESP 
has a stream of funding guar- 
anteed by the constitution of 
Sao Paulo, which requires 
that 1% of the tax revenue 
goes to the foundation. Its success in fostering 
research and education inspired other Brazilian 
states: all but one now has a similar agency, and 
most have guaranteed funding linked to taxes. 

FAPESP directs 37% of its funding to basic 
research in fields ranging from climate change 
to particle physics. About 10% goes to infra- 
structure and the rest is channelled to applied 
research. Nearly one-third of its total budget is 
devoted to medical research. 

“One difference in FAPESP’s work is that 
we invest a lot in basic science,” says Carlos 
Henrique de Brito Cruz, FAPESP’s scientific 
director. “We believe in balance” 

The most recent large project approved for 
funding is the Long Latin American Millimeter 
Array radio telescope, a joint project between 
Brazil and Argentina that will receive $12.6 mil- 
lion from the agency and an equal amount from 
Brazil's science ministry. FAPESP’s board is con- 
sidering a $40-million investment in the Giant 
Magellan Telescope, which would give Sao 
Paulo astronomers access to the facility, planned 
for construction in Chile. 

Science officials in other nations can only 
look with envy at the agency’s guaranteed 
funding. “FAPESP is a very interesting model 
for us because Sao Paulo is one of the few 
states in the world where support of research 
is linked directly to GDP?’ says Martyn Polia- 
koff, foreign secretary and vice-president of 
the Royal Society in London. 

Regional agencies such as FAPESP play a 
very important role in Brazil, says Wanderley 
de Souza, a biomedical scientist at the Federal 
University of Rio de Janeiro and a member 
of the Brazilian Academy of Science. “They 
can make research happen even if the federal 
funding gets scarce.” 

Brazil struggles with vast economic differ- 
ences among its various regions, and that is 
reflected in regional science budgets. FAPESP 
has the biggest budget of all the regional agen- 
cies, but that does not reduce federal invest- 
ments in the state, says Clelio Campolina, the 
minister of science, technology and innova- 
tion. “We want to improve other states, but also 
reward excellence," he says. 

FAPESP’s rapid growth has raised some con- 
cerns among scientists in Sao Paulo who com- 
plain about an increase in bureaucracy. But 
agency officials defend its performance and 
say they are working to improve its procedures. 

It’s all part of an effort to produce high- 
quality work, says Brito Cruz. “We want the 
best projects.” = 


SOUTH AMERICAN SCIENCE 


A Nature special issue 
nature.com/southamerica 
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semiconductor 
from a FAPESP- 
funded project. 
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COLOMBIA 


GROWTH 
CENTRE 


BY LISA PALMER 


herd of hefty cows at Petequi farm munches 

away on lush grass that looks as if it has 
grown there forever. But the plants are rela- 
tive newcomers. They are cultivars of African 
super grasses, bred for enhanced nutrition and 
hardiness by researchers at the International 
Center for Tropical Agriculture (CIAT), less 
than 50 kilometres to the north. 

Cows at Petequi once took four years to 
reach market weight. Now they fatten up in 
just 18 months. The story is much the same 
throughout the South American cerrado, or 
savanna. The improved grasses have revolu- 
tionized tropical forage across the continent 
thanks to the combined work of research- 
ers at CIAT and the Brazilian Enterprise for 
Agriculture Research, a state-owned Brazilian 
company, says Eduardo Trigo, an agricultural 
economist and science adviser to the Argen- 
tine ministry of science, technology and inno- 
vation in Buenos Aires. “CIAT has been one of 
the key actors in the development of the South 
American cerrado,’ he says. 

Established in 1967, the Colombian facility 
was one of the first members of the CGIAR 
consortium of international agriculture 
research centres. CIAT employs 325 scientists 
and has an annual budget of $114.4 million, 
paid for by the multi-donor CGIAR fund and 
by other international donors. 

Aside from its work on grasses, CIAT has 
focused on breeding improved varieties of 
beans, rice and cassava — staple crops that 
are important to the food security of the rural 
poor. “Genetic improvement of these crops has 
proved to bea powerful weapon for combating 
hunger and poverty,’ says Ruben Echeverria, 
director-general of CLAT. For example, beans 
developed by CIAT from Latin American vari- 
eties are now feeding up to 30 million people 
in Africa, according to the centre. 

Some 70% of rice in South America, and 
90% of cassava in Asia, can be traced back to 
CIAT’s breeding programme. “Cassava is now 
a multibillion-dollar business for starch pro- 
duction in Asia, providing income to small- 
holders,” says Andy Jarvis, leader in policy 
research at CIAT. 

The centre has also helped to grow expertise 
on the continent and elsewhere; since CIAT 
opened, some 13,000 researchers have trained 
there. Its facilities have been instrumental in 
building capacity for plant physiologists in 
the poorer countries of the Andean region, 
says Trigo. = 


n the Cauca Valley of western Colombia, a 
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Alberto Kornblihtt: 
RNA pioneer. 


ARGENT 


THE RNA SLEUTHS 


BY ALESZU BAJAK 


the periphery of scientific research,’ he admits from his office in Buenos Aires. “But it’s 

not an impossible place to do science.’ In fact, he and his community of researchers in 
alternative RNA splicing — a field he helped to create — have shown that they can do world- 
class research despite tight government budgets and three-month delivery times for reagents 
that can cost three times as much as they would in the United States or Europe. 

Like Kornblihtt’s lab, alternative RNA splicing makes use of constrained resources in 
innovative ways. Through varied patterns of cutting and rejoining, a single transcribed gene 
can give rise to many different messenger RNAs, thus permitting a single gene to express 
different proteins. Kornblihtt found one of the first cases of this process in humans while he 
was a postdoctoral fellow in the United Kingdom. He moved back to Argentina in 1984 and 
has assembled a group of researchers that continues to explore this realm. 

Ithas been a good year for his group. Kornblihtt and his doctoral student Ezequiel Petrillo 
published a paper in Science in April on how light affects alternative splicing in plants 
(E. Petrillo et al. Science http://doi.org/s2d; 2014). And last month, Gwendal Dujardin, a 
postdoctoral fellow from France (a rare sight in an Argentine lab), published a splicing study 
in Molecular Cell (G. Dujardin et al. Mol. Cell 54, 683-690; 2014). 

The work is all part of a continuum, says Kornblihtt. He considers scientific research in his 
native Argentina to be part ofa long tradition that started with Bernardo Houssay and Luis 
Leloir, twentieth-century Nobel laureates whose names now adorn avenues, museums and 
universities across the country. “The scientific institutions they founded led to generations 
of disciples that continue to do the science of today,’ he says. 

Kornblihtt carries on that tradition, in part by teaching an introductory course on molecu- 
lar biology at the University of Buenos Aires. “That course has been a nursery for many 
young Argentine scientists,’ he says. It lures in many students, says Diego Golombek, a biolo- 
gist at the National University of Quilmes in Buenos Aires. “Imagine that on the first day of 
classes, young students find themselves before the country’s most well-known researcher 
teaching molecular biology classes with an absolutely contagious enthusiasm,” he says. “He's 
had an influence over the new generations of biologists.” 

Petrillo, who has just left Argentina for a research post at the Medical University of Vienna, 
says that he will sorely miss the camaraderie of the tight-knit group of RNA researchers from 
labs and universities all over Buenos Aires. The RNArgentinos, as they call themselves, have 
for years organized informal seminars and get-togethers to share ideas, concerns, protocols 
and techniques. 

Kornblihtt recognizes that Argentine scientists cannot all work in their home country and 
he encourages his students to “seed the world” as postdocs abroad. But he asks his university 
students to complete their PhDs in Argentina. “It’s not necessary to leave the country to get 
a doctorate,’ he says. “We have a strong science ministry, lots of scholarships and subsidies 
and new research buildings. The structure to do science in Argentina is not precarious. It 
has many pillars.” a 


ee biologist Alberto Kornblihtt likes to put things in perspective. “We may be on 
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HOMEWARD BOUND 


South American efforts to repatriate scientists are paying off. 


BY BARBARA FRASER 


hen Andrea Bragas left Argentina in 2000 for a postdoctoral fel- 

lowship at the University of Michigan in Ann Arbor, she did not 

know where she would eventually end up. Although the terms of 
her fellowship obliged her to return home, Argentina’s economy was 
heading for a crisis and there was no guarantee of continued govern- 
ment funding, much less a job when she came back. 

But the gamble paid off. By 2004, Argentina's economy had started to 
rebound and the president was pledging new investments in science and 
technology. Bragas returned to teach at the University of Buenos Aires 
and is now a nanoscientist at CONICET, Argentina's National Scientific 
and Technical Research Council. 

Across South America, thousands of researchers have similar stories. 
Countries that saw some of their most promising scientists flee during 
decades of dictatorships or economic crises are now reversing the brain 
drain, luring researchers home with offers that range from short-term 
teaching and research fellowships to fully equipped labs and competi- 
tive salaries. 

“Unlike financial capital, which is hard to recover once it has left the 
country, intellectual capital returns with interest,’ says Lino Barafiao, 
Argentina's science and technology minister. “A scientist who has spent 
some years outside the country has training, networks of contacts and 
access to top institutions — and from a productivity standpoint can be 
more valuable than one who has stayed in the country.’ 

Brazil was one of the first South American countries to invest in build- 
ing a base of researchers. When Lindolpho de Carvalho Dias attended 
the first Brazilian Mathematics Colloquium as a student in 1957, he was 
one of about 50 participants in a country that had few universities and 
no graduate programmes in science. 

But the government was taking major steps to close the education 
gap. In the early 1950s, it created the National Council for Scientific 
and Technical Development (CNPq) and launched a higher-education 
campaign. Since then, Brazil has paid to send students abroad for gradu- 
ate study, with the commitment that they would come back to teach and 
do research. Many of those who returned became staff members in new 
graduate programmes and the country has ramped up its production 
of scientists and engineers. The number of doctorates awarded in those 
fields per year nearly doubled between 2001 and 2011. 

As a measure of the country’s scientific growth, the mathematics col- 
loquium currently draws about 1,000 participants a year. And research 
institutes in Brazil now attract both home-grown and foreign talent, 
adds Dias, who has served as director of the CNPq and as executive 
secretary of the Ministry of Science and Technology. 

Like Brazil, Argentina has long sent students abroad for graduate 
education. But the country has only recently devoted sustained and 
coordinated funding to provide opportunities for returning researchers 
like Bragas. The science ministry now runs a programme called RAICES 
(‘Roots’) to encourage researchers to return home with offers of fully 
equipped laboratories and salaries comparable to those in the United 
States and Europe. 

So far, 1,062 Argentinean scientists have returned. Most have gone to 


Andrea Bragas in her nanotechnology laboratory. 


public universities or research centres, although Barafiao expects that to 
change as Argentina's private technology sector cranks up. The employer 
usually provides laboratory facilities, and RAICES pays moving costs 
and subsidizes salaries for a few years. As an added incentive, it also 
helps with placements for spouses. 

In Chile, the Millennium Scientific Initiative — launched in 1999 
— has set up centres of excellence and offers study-abroad fellowships 
with a commitment to return home to work. It has also established a 
programme called ChileGlobal, which lets Chilean scientists network 
at home and abroad through seminars and other activities. 

Countries with smaller science budgets are also experimenting with 
ways to repatriate researchers through fellowships, networking and 
incentives. In March, Colombia’s Department of Science, Technol- 
ogy and Innovation announced the US$9-million ‘It’s Time to Return’ 
repatriation programme. The initiative offers research posts in various 
fields, and hopes to lure back 500 Colombian PhD holders in its first 
two years. 

Although brain-drain-reversal programmes take different forms, 
Barafiao says that the key is to harness the expertise, contacts and 
experience of researchers outside the country — many of whom were 
educated at least partly at the taxpayer’s expense — while expanding 
research facilities and opportunities at home. 

Ultimately, the long-term success of these efforts may depend on 
the willingness of governments and companies to increase research 
investments, which have been climbing only modestly relative to gross 
domestic product in most South American countries.“You have to create 
a competitive research environment with top-quality, interdisciplinary 
research centres,’ says Barafiao. “Even if you offer a good salary or pay 
relocation expenses, without those conditions, a good researcher wont 
return.” m 


Barbara Fraser is a freelance writer in Lima. 
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Architects of 
South American 
science 


Ten research leaders call for policies to build science, 
and ways to build science into policy. 
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Strengthen 
networks 


Eduardo Arzt is director of the 
BioMedicine CONICET-Partner 
Institute of the Max Planck Society, 
Argentina 


Regional and cross-continental networks 
strengthen science in South America. They 
encourage young scientists to return home, 
motivate governments to invest in their own 
science, and fill gaps in core technologies 
such as advanced microscopy and proteo- 
mics, which require sophisticated instru- 
ments. A number of initiatives in recent 
years illustrate several creative approaches. 

One model relies on partnerships with 
other prestigious institutes. For example, 
Uruguay’s Pasteur Institute in Montevideo 
was founded in 2004 through an agreement 
with its counterpart in Paris, and the Bio- 
medicine Research Institute of Buenos Aires, 
inaugurated in 2011, is a partner institute 
of the German Max Planck Society. Both 
institutes have recruited dozens of young 
researchers and built dedicated laborato- 
ries. They have also appointed international 
boards of scholars to offer advice and evalu- 
ate the quality of the science. This positive 
feedback loop should motivate similar evalu- 
ation schemes across other institutions. 

Other programmes also foster collabora- 
tions between scientists in South America 
and scientists in North America and Europe. 
In April, Argentina became an associate 
member state of the European Molecular 
Biology Laboratory (EMBL). Symposia 
have already been organized, and Argentin- 
ian scientists now have access to the EMBL's 
state-of-the-art resources. 

The Millennium Science Initiative (active 
in Chile and Brazil), the US National Insti- 
tutes of Health’s Fogarty International 
Center, the Howard Hughes Medical Insti- 
tute, the Pew Charitable Trusts (see 


SOUTH AMERICAN SCIENCE 
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ae 


> page 213) and the Partner Groups of the 
Max Planck Society all sponsor individual 
scientists to help to create a critical mass in 
fields such as molecular biology, neurosci- 
ence and nanotechnology. 

Regional entities have recognized the ben- 
efits of such programmes. The multinational 
South American trade group MERCOSUR, 
through its fund FOCEM, pro- 

vided US$7 million to build a 

biomedical research network 
spanning six institutions in 

Argentina, Uruguay, Brazil 

and Paraguay. The network 
will foster research, training 
and technology transfer in 
molecular medicine. National 
governments will chip in a 
further $3 million. 

These networks are building 
momentum in the region’s sci- 
ence. As they begin to bear fruit, 
the time is right to build on them 
and not become complacent. 


PERU 
Build research 
capacity fast 


Gisella Orjeda is president of 
the National Council for Science, 
Technology and Technological 
Innovation, Peru 


It is an exciting time for science in Peru. After 
years of neglect, the budget of the National 
Council for Science, Technology and Tech- 
nological Innovation (CONCYTEC) has 
grown 20-fold in just 18 months to almost 
US$110 million, and it will continue to grow 
at the same rate. For the first time, Peru has 
a president who is prioritizing science and 
innovation. Journalists are trying to grasp 
and explain new concepts. 

Now Peru needs highly qualified scientists 
and scientific managers. We must learn how 
best to organize calls for proposals, allocate 
funds, build programmes and reach compa- 
nies. Then we must work out how to build 
prosperity with our new-found knowledge. 

CONCYTEC establishes and promotes 
national policies for science, technology 
and innovation, and funds research. We 
work with local governments, the private 
sector, scientific institutes, universities and 
colleges. This is a big task for an organization 
of 148 people that until 2012 had an annual 
budget of just $6.3 million and almost no 
information about the set of institutions that 
produce, transfer and use knowledge. 

We are building these capacities: defin- 
ing evidence-based policies and priorities, 
adhering to conflict-of-interest guidelines, 
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and establishing a merit-based review of 
proposals and incentives for innovation. 
We are eliminating rigid rules for immigra- 
tion, buying scientific equipment and hiring 
qualified personnel. 

I returned to Peru eight years ago, after 
spending ten years in France, because I 


wanted to make a difference in my coun- 


try. After publishing the potato genome 
in Nature in 2011, I never imagined that I 


would have to leave science to lead science, 
but I have no regrets. It is thrilling to be at 


the helm of CONCYTEC as we face the 
formidable challenge of constructing a 


knowledge-based economy. 


BRAZIL 
Boost pro-forest 
economics 


Carlos Nobre is national secretary for 
research and development policies at 
the Ministry of Science, Technology 
and Innovation of Brazil 


The deforestation of the Amazon must stop: 
when forests are cleared for agriculture, 
cattle ranching and logging, the damage 
is felt environmentally, economically and 
socially. But simply curbing deforestation is 
not enough: sustainable-development strate- 
gies must also improve well-being for local 
communities. 

Unfortunately, the global economy places 
a higher premium on meat and soya beans 
than on forests. Creating a new economic 
model for the Amazon forest will there- 
fore take two transformations; both require 
science. 

One strategy is to add value to locally har- 
vested products. A good example of such a 
bioindustry is the acai fruit of the palm tree 
Euterpe oleracea that grows in the Amazon. 
Until around 20 years ago, the dark berries 
were a food staple consumed only by the 
local population. Today, acai fruit is used in 
produce including food, nutritional supple- 
ments, cosmetics, dyes and industrial oils 
around the world. Annual pulp production 
exceeds 200,000 tonnes and contributes 
more than US$2 billion to Brazil’s economy, 
second only to beef and tropical timber. 
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Local agai producers can make more 
than $1,000 per hectare in annual profit, 
5-10 times more than from soya and at least 
15 times more than from cattle. Embrapa — 
the Brazilian Agricultural Research Corpo- 
ration — has used agai to produce a dye for 
bacterial plaque that is now ready for com- 
mercial use in toothpaste and mouthwash. 

More research is needed to identify uses 
for new and known natural products, and 
to scale up production. Ina decade or two, 
it should be feasible to increase the exploi- 
tation of dozens of forest products. 

A second strategy is to make better use of 
the large areas of already cleared forest — 
estimated at more than 750,000 square 
kilometres in the Brazilian Amazon 
alone — to reduce the need to clear even 
more. A nationwide Low Carbon Agricul- 
ture Program aims to more than double cat- 
tle occupancy per hectare within a decade. 
Field research conducted by Embrapa and 
the Brazilian cosmetics company Natura 
showed that oil-palm plantations on small- 
holdings could be integrated with other 
crops, such as nitrogen fixers, to obtain 
yields comparable to those of large-scale 
plantations. 

Both these transformations require 
educating the rural and urban populations 

to change their 


“In a decade or ways. Technical 
two, it should programmes to 
be feasible to increase agricul- 
increase the tural productivity 
exploitation of must reach hun- 
dozens of forest dreds of thousands 
products.” of farmers. Isolated 

traditional popula- 


tions will need help to reap value from col- 
lecting and selling products of biodiversity. 
Doing so will rely on modern communica- 
tion — a new government-owned telecom- 
munications satellite is set to start operations 
in 2016 to bring high-speed Internet to 


communities in the Amazon. 


CHILE 
Empower coastal 
research 


Juan Carlos Castilla is professor 
emeritus at the Pontifical Catholic 
University of Chile 


Rich countries can protect vast areas of their 
seas. Australia bans fishing in 345,000 square 
kilometres of the Great Barrier Reef; Califor- 
nia protects about 16% of its coastal waters, 
some 2,200km”. This approach will not work 
in the parts of the developing world where 
people’s livelihoods depend on coastal fish- 
ing. A promising alternative is community- 
centred stewardship, boosted by research 
and education. 

The Chilean government grants coastal 
communities exclusive territorial use rights 
in fisheries (TURFs) to extract seafood from 
a designated area, in exchange for a man- 
agement plan that limits the annual catch 
proportion of algae and benthic organisms 
(bottom-dwelling animals including mol- 
luscs, shrimp and crabs). Around 500 of these 
co-management areas encompass more than 
1,100km”. The areas are only 4-10 km apart, 
so larvae and young animals from one area 
can disperse into another. 

This system of fishery co-management 
was established in 1991. Communities differ 
in their performance, but results reported in 
2012 revealed a desirable by-product: TURF 
areas show robust increases in the biodiver- 
sity of invertebrates, algae and rockfishes 
compared to uncontrolled areas. 

Co-management empowers people to 
care for their resources. Ifa port or power 
plant begins operations nearby, communi- 
ties demand that any damage to their area is 
assessed and compensated. 

In unmanaged areas, the coast is over- 
fished. TURFs are not enough. One strat- 
egy is to develop communal-management 
approaches for specific resources in the areas 
that can be fished by anyone. Regulations in 
Chile that came into force last year will set a 
total allowable catch for key species, attempt- 
ing to account for a marine stock’s reproduc- 
tive, growth and mortality rates. A network 
of no-take areas between TURFs would 
also help. Ocean life in the no-take areas 


could help to restock 
depleted populations. 

We must learn 
from experience, doc- 
umenting and assessing 
the effects of ecosystem 
management. If these strategies 
fall into place, communities can 
continue to fish, protect biodiver- 
sity and safeguard coastal ecosystems. 


ARGENTINA 
Fuel public- 
private consortia 


Lino Barafiao is Minister of 
Science, Technology and Innovative 
Production, Argentina 


After a decade of policies aimed at boosting 
research, science in Argentina is starting to 
have positive effects on economic develop- 
ment and society. Now, greater involvement 
from the private sector is required. 

Five years ago, the Argentinian govern- 
ment launched the Sectoral Funding Strat- 
egy to promote public-private consortia. 
From 2008 to 2013, more than 5,000 compa- 
nies, including 80 start-ups, received a total 
of US$800 million as grants or loans with 
below-market interest rates. The govern- 
ment also created programmes for postdocs 
and established researchers to gain experi- 
ence in private companies. The number of 
scientists in industry increased from 7,200 
in 2003 to 12,300 in 2012, and is expected to 
rise to more than 18,000 by 2020. 

Projects funded by the strategy must 
combine a key enabling technology (such as 
biotechnology, nanotechnology, or informa- 
tion and communications) with a strategic 
area (such as health, energy, or environment 
and social development). They must also 
provide a business plan to bring an innova- 
tive product or service to market within five 
years. Some projects have already moved 
beyond proof of concept, including pro- 
duction of human growth hormone in the 
milk of transgenic cows and nanotechnology 
systems for drug delivery. Another example 
is Satellogic, a company that is developing 
nanosatellites for imaging. It is about to 
launch its third prototype and has already 
received private investment. 

In 2012, Argentina's national research 
council, CONICET, and its national petro- 
leum company, YPF, came together to create 
a joint company called Y-TEC. The firm, 
which employs more than 70 researchers, is 
developing technologies to exploit uncon- 
ventional oil such as shale and renewable 
energy, and has already submitted six pat- 
ent applications, three of which are licensed. 
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In developing countries, the science and 
technology sector cannot focus only on 
cutting-edge technologies; it must also 
promote social inclusion. The latter is 
illustrated by the Guanaco Project in the 

Andes, which is developing textiles for the 
‘responsible luxury’ market. Guanacos, close 
cousins of llamas and vicunas, produce a fibre 
superior to cashmere. 

In the past, science had only a cultural 
role in Argentina. Now it is contributing to 
a knowledge-based economy as a means to 
achieve a more just society. 


BRAZIL 
Reward quality 
not quantity 


Sidarta Ribeiro is director of the Brain 
Institute at the Federal University of 
Rio Grande do Norte, Brazil 


In the past decade, the Brazilian govern- 
ment has put substantial resources into 
education and science. It has: established 
a minimum wage for school teachers; allo- 
cated 1.2% of the gross domestic product 
to fund research; and launched the Science 
without Borders scholarship programme 
to attract foreign talent to the country and 

to help promising 


“Independent Brazilian research- 
international ers to train abroad. 
evaluations at Two of the big- 
universities gest remaining 
andresearch barriers to improv- 
institutes might ing the nation’s 


research are per- 
formance evalu- 
ation and rewards. Valuing quantity over 
quality is so ingrained in Brazil’s scientific 
culture that it is nicknamed numerologia 
(numerology), a pun on the mystical belief 
in the power of numbers. 

The official Qualis system for the evalua- 
tion of scientific papers and journals — which 
carries heavy weight in grant and job applica- 
tions — encourages Brazilian researchers to 
publish as many papers as possible, regardless 
of the international impact of their research. 
Qualis does recognize different tiers of jour- 
nals, but the categories are so broad as to be 
almost meaningless — a paper published ina 
journal such as Nature or Science and one ina 
highly specialized journal might be counted 
equally. Rather than gathering a full set of 
experiments into a coherent story, scientists 
gain more recognition in the system by break- 
ing related work into multiple papers. 

Independent international evaluations at 
universities and research institutes might be 
the key to rewarding innovation and cutting- 
edge science more effectively. 


be the key.” 
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VENEZUELA 
Respect science 
and scientists 


Claudio Bifano is president of the 
Venezuelan Academy of Physical, 
Mathematical and Natural Sciences 


Much of Venezuela’s technology and 
scientific capacity, built up over half a cen- 
tury, has been lost in the past decade. We 
need to restore respect and funding to basic 
research to halt the brain drain and reverse 
this catastrophic trend. 

In recent years, Venezuela has invested 
more than 2% of its gross domestic prod- 
uct (GDP) in science and technology, and 
boasts a workforce of about 13,000 scientific 
researchers. But the number of publications 
in international journals declined by 40% 
in 2008-12, from roughly 1,600 to 1,000. 
The total number of publications in 2012 
matched that of 1997, when the country had 
fewer than 3,500 researchers, and a science 
and technology budget of just 0.3% of the 
GDP. 

According to a 2011 survey, 51% of Ven- 
ezuelans over 25 years old living in the United 
States have finished university (compared to 
13% of the US Hispanic population and 29% 
of all US residents). The online publication 
Piel-Latinoamericana reports that 1,100 of 
1,800 physicians who graduated from medi- 
cal school in Venezuela in 2013 have left the 
country. In other words, educated Venezue- 
lans are fleeing — or are being forced out. For 
example, in 2003, roughly 1,000 professionals, 
mostly physical scientists and engineers, were 
fired from Venezuela’s petroleum research 
and development institute, INTEVEP. Inter- 
national agencies report that no patents have 
been granted since that time. 

Since 1999, the Venezuelan government 
has imposed a political model called social- 
ism of the twenty-first century. I and others 
find it based mainly on authoritarianism, 
with some ideas from Marxist philosophy 
and extreme populism. Science, according 
to the minister for science and technology, 
is for the solution of societal problems. The 
National Science, Technology and Innova- 
tion Plan (2005-30) says that science must 
be conceived as a process that involves new 
participants, such as the holders of tradi- 
tional and local knowledge. 

To achieve this goal, the Ministry of 
Science, Technology and Innovation sup- 
ports projects submitted not only by sci- 
entists but also by those without scientific 
training and by organizations such as com- 
munity councils, environmental groups and 
associations geared towards the social ser- 
vices. Funded programmes include one that 
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distributes computers to school children and 
missions for a remote-sensing satellite and 
a data-transmission satellite. These may be 
laudable projects, but they are not science. 

Allowing those who lack scientific train- 
ing to access public funds for scientific 
research trivializes science. 


BRAZIL 
Banish 
bureaucracy 


Jose Eduardo Krieger is provost 
of research at the University of Sdo 
Paolo, Brazil 


Brazil needs a better environment for knowl- 
edge creation and innovation. Bureaucracy 
currently holds back research. Fixing this 
will require changes to institutional policy 
and national legislation. 

At the University of Sao Paulo, for 
instance, we began a major initiative in 2011 
to enable scientists to focus on what they 
do best, rather than wasting time filling in 
forms. The university is the largest research 
institution in South America, responsible 
for about 20% of all papers published in Bra- 
zil every year. The institution’s 6,000 scien- 
tists win almost half of the US$450 million 
that the state of S40 Paulo awards to support 
research. 

But most Brazilian grants do not cover 
overhead or indirect costs, such as facility 
maintenance. So our universities lack the 
support offices that North American and 
European researchers rely on to help with 
ordering equipment and reagents, pay- 
ing invoices, financial reporting, contract 
negotiation and account monitoring. Every 
researcher must set up these systems indi- 
vidually. 

By the end of this year, the University of 
Sao Paulo will roll out a digital platform 
to assist researchers with procurement, 
accountability and operations. We are also 
creating a network of trained project man- 
agers to assist specialized schools and large 
research groups. These measures follow a 
$100-million, four-year effort by the uni- 
versity to reorganize its research enterprise. 
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More than 100 research support groups 
have been created, each with a technician, to 
encourage scientists to organize themselves 
into interdisciplinary clusters. 

These strategic moves will be comple- 
mented by improvements in the regula- 
tory laws currently under discussion in the 
Brazilian Congress. These should allow 
equipment and consumables for academic 
research to be imported more quickly and 
easily — giving our scientists more time for 
research, and helping them to compete with 
their peers in North America and Europe. 


CHILE 
Base policy on 
evidence 


Pablo C. Guerrero is assistant 
professor at the University of 
Concepcion, Chile; Mary T. K. Arroyo 
is director of the Millennium Institute 
of Ecology and Biodiversity, Chile 


Chile needs a system for formulating public 
policy on the basis of sound scientific infor- 
mation. The government's decision in March 
not to create a ministry of science passed up 
a valuable opportunity for that. 

The current disconnect between science 
and policy within the government is wor- 
rying, as two recent examples show. First 
is the devastating fire that swept through 

parts of the city 


“The current of Valparaiso in 
disconnect April. For dec- 
between science ades, authorities 
and policy ignored ecologists’ 
within the warnings about 
government is expanding highly 


flammable euca- 
lyptus plantations 
that are now near many cities in central 
Chile, and where the Valparaiso fire started. 

Second, Chile has experienced seven 
earthquakes measuring magnitude 7 or 
more in the past decade. Here, too, scant 
attention was paid to scientists’ predictions 
about the accumulation of seismic strain. 

Some have suggested that Chile’s highly 
regarded science-funding body, the 
National Commission for Scientific and 
Technological Research (CONICYT), 
could regain its past influence and advise 
on public policy once more. To do so it 
will need to adjust its current emphasis on 
impact factors and international recogni- 
tion of basic science. CONICYT should 
give explicit credit to basic-science prob- 
lems that are relevant to the concerns of 
Chileans, such as the availability of water 
resources in a changing climate and innova- 
tive ways to use minerals. m 


worrying.” 


ILLUSTRATION BY DAVID PARKINS 


Turning brain 
drain into brain 
circulation 


Overseas scholarships that encourage scientists to 
return to their home countries are helping to rebuild 
science in Latin America, says Torsten Wiesel. 


build a strong base in science, but only 

a short time to destroy it. Germany was 
a sad example. It was a world leader in the 
sciences for more than a century, until its 
science base was demolished during the 
Nazi era, and the country ceded its position 
to the United States. It has taken decades for 
Germany to rise again to its current level of 
excellence. 


lf takes a long time for a country to 


The German experience has much in 
common with the situation in Latin Amer- 
ica, where authoritarian regimes came to 
power in the mid-twentieth century in coun- 
tries including Brazil, Chile and Argentina. 
Asa consequence, many of the continent’s 
best scientists emigrated to the United States, 
Europe and Canada. When the dictatorships 
were finally shaken off in the 1980s and 
1990s, the departed scientists were settled 
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in their new homes and had little incentive 
to return to countries left laden with debt. 

Many have forgotten that science in 
Latin America was once robust. For exam- 
ple, Bernardo Houssay, who won the 1947 
Nobel Prize in Physiology or Medicine, 
directed the Institute of Physiology at 
Buenos Aires University until 1943, when 
the government fired him for advocating 
for democracy; his protégé, Luis Leloir, won 
the 1970 Nobel Prize in Chemistry. Several 
emigrants also became laureates, includ- 
ing the immunologist Baruj Benacerraf, 
from Venezuela, and the biochemist 
César Milstein, from Argentina. 

Against this background, the Pew Latin 
American Fellows Program was founded to 
help to rebuild and strengthen biomedical 
sciences in the region. From its inception, 
the programme has been linked to the pre- 
existing Pew Biomedical Scholars Program, 
which each year provides around 20 promis- 
ing newly independent US scientists with 
four-year scholarships, funded by the Pew 
Charitable Trusts, a non-profit organization 
based in Philadelphia, Pennsylvania. 

In March 1989, at the annual meeting 
of the scholars programme in Puerto Val- 
larta, Mexico, a group of these scholars 
— struck by the lack of resources of their 
counterparts in Mexico — sought help from 
Rebecca Rimel, president of the Pew Chari- 
table Trusts. Later, Rebecca and I discussed 
the best ways to train talented students from 
Latin America, and our ideas crystallized 
into the fellows programme. 


REPATRIATION RATES 

Since the founding of the Pew Latin 
American Fellows Program in 1991, about 
ten graduate students each year have been 
awarded two-year postdoctoral fellowships 
to work in some of the best labs in North 
America. It is no surprise that some remain 
abroad to continue their careers in more 
developed countries. What is surprising is 
that more than 70% return to their home 
countries, which may not always allocate 
sufficient resources to cutting-edge research 
(see ‘Bringing science home’). For compari- 
son, the Human Frontier Science Program, 
a multinational initiative that supports the 
life sciences, also funds postdoctoral fellows 
worldwide — but fewer than half of those 
who train in the United States return to their 
home countries. 

Pew fellows who remain in North 
America have positions in leading uni- 
versities and several have established joint 
projects with labs in their home countries, 
as well as hosting new fellows. The annual 
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PEW LATIN AMERICAN FELLOWSHIP 


Bringing science home 


Becoming a great scientist requires exposure to greatness. 

At a 1997 orientation meeting in Costa Rica for new postdocs, 
Torsten Weisel, the co-founder of the Pew Latin American Fellows 
Program, told us that the best scientists are not necessarily more 
creative or smarter than everyone else, but that they had the 
opportunity in their junior years to conduct and discuss science in 
prime environments. 

| earned my PhD in 1996 from the University of Chile in Santiago, 
studying how ions move through proteins extracted from neurons. 
| wanted to apply that work in living brains. Senior members in my 
department told me about the Latin American fellows programme 
and helped me to find a postdoctoral adviser. 

Charles Zuker, then at the University of California, San Diego, 
accepted me into his lab and taught me to study how flies sense 
the world. It was an amazing experience to be in the Zuker lab when 
seminal work on taste and pressure receptors was happening. | 
was part of the team that helped to show how the organization of 
proteins in photoreceptor cells is essential for flies to see light. | 
returned home to work as a junior professor at the University of 
Chile in 1998. 

Even now, few institutions in South America provide start-up 
funds to new faculty members. Most young professors have to join 
senior laboratories or sit in an empty lab, sometimes for more than a 
year, before getting their first grant. By contrast, | had a US$35,000 
repatriation fund from my Pew fellowship. The money was enough 


to buy small, essential equipment to start doing some simple 
experiments soon after | returned: a table-top centrifuge to separate 
cells into basic components, power supplies, electrophoresis 
chambers to run gels for DNA analysis, a mechanical shaker to grow 
bacteria and some reagents. 

Since then, | have trained nearly two dozen students to work with 
flies and have helped four researchers to set up their own labs for fly 
research in Chile. | have also directed three international courses to 
train Latin American students to use the insects (and, more recently, 
worms) as animal models. 

And my relationship with Pew continues. | have started 
collaborations with scientists from other countries whom | met 
at annual Pew alumni meetings. For the past five years, | have 
served on the regional Pew committee that selects six Chilean 
candidates for the fellowship. We look for young researchers who 
have connected with a great lab and proposed adventurous projects 
— particularly to work in areas or with animal models that are not 
available at home. The hope is that they will bring those skills back 
to their native countries. 

Chile has an 80% repatriation rate. That bespeaks both a good 
selection process and the importance of the start-up money for 
returning fellows. Scientific agencies and governments in Latin 
America should try to replicate these measures to help to build a 
stronger and more innovative scientific community. Jimena Sierralta, 
University of Chile 


meetings are attended by Latin American 
fellows, biomedical scholars and senior 
advisers, including Nobel laureates and 
Howard Hughes Medical Institute scholars. 
Participants share ideas and start collabora- 
tions as a result of the meetings. 


SUCCESSFUL SCHOLARS 

Ina survey sent out in 2013 to 202 alumni 
of the Latin American fellows programme 
between 1991 and 2011, an impressive 151 
responded. Alumni who have returned to 
their home countries include department 
heads and university provosts. Nearly half 
reported holding a director position, such 
as department chair or head of an academic 
discipline. On average, each fellow had 
published 15 papers, and those who had 
returned home had trained 13 scientists, 
from technicians and graduate students to 
visiting scholars. 

Last month, the journal Cell high- 
lighted a 2003 Pew fellow, immunologist 
Dario Zamboni, as one of 40 notable sci- 
entists under 40 years old. Zamboni is head 
of the Innate Immunity and Microbial 
Pathogenesis laboratory at the University 
of Sao Paulo in Brazil. His group is work- 
ing out how the body responds to intracel- 
lular parasites, including the one that causes 
Chagas disease — a problem in poor, rural 
areas of South America. Doing science in 
Brazil involves hurdles that would not exist 
in the United States, but he is determined 
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to improve the system for other scientists 
in the country. 

Selection of fellows starts with estab- 
lished researchers in Latin America. 
Argentina, Brazil, Chile and Mexico have 
national committees of former Pew fel- 
lows and senior scientists. Each committee 
selects six applicants by evaluating research 
proposals and interviewing a dozen or so of 
the most promising students. (The chairs 
of these committees act together as a fifth 

multinational com- 


“The fellows mittee for applicants 
programme from the other coun- 
is just a drop tries in the region.) 

in the ocean Thirty applica- 
relative to tions are chosen in 
the need of total to be evaluated 
the entire by a central commit- 
continent.” tee of outstanding US 


scientists with strong 
ties to Latin America. Several are emigrants 
from the dark periods in their countries of 
origin. These committees work hard to 
select the most promising scholars and send 
them to the best labs. 

The Brazilian state of Sao Paulo plans 
to augment the benefits that are open to 
returning Pew fellows: they can apply for 
a generous four-year stipend to get their 
new labs off the ground. The hope is that 
other nations will use their own resources 
to extend this initiative to foster their best 
scientists. 
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The absolute number of Latin American 
fellows is small — fewer than 250 ina region 
with more than 400 million people. But my 
impression is that they have an outsized 
influence, shaping expectations of what it 
means to bea scientist in Latin America, and 
the fellows high expectations of themselves. 

That said, the fellows programme is just a 
drop in the ocean relative to the need of the 
entire continent. This is perhaps especially 
true now that larger programmes exist in 
several Latin countries to support the train- 
ing of scientists abroad and to encourage 
trained scientists to return home, such as 
the Brazil Scientific Mobility Program (see 
page 207). 

Nonetheless, like a seed planted in a 
fertile soil, the Pew programme has flour- 
ished over the past 20 years. The plant will 
no doubt continue to grow and to sup- 
port its ecosystem. The ultimate success 
would be that this type of programme is 
no longer needed because each country 
would have developed strong, independent 
scientific establishments. But for now, we 
need to bolster the support for scientists in 
emerging countries, in Latin America and 
elsewhere. m 


Torsten Wiesel is president emeritus of 
Rockefeller University in New York City, 
USA. He won the 1981 Nobel Prize in 
Physiology or Medicine. 

e-mail: wiesel@rockefeller.edu 
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Robert Rauschenberg’s 1963 installation Oracle, created with engineers Billy Kliiver and Harold Hodges. 


The third culture 


Michael John Gorman is intrigued by asurvey of art 
informed and invigorated by science. 


fter months of injections with horse 
Azmsneatbatn in 2011, artist 
Marion Laval-Jeantet had a trans- 

fusion of horse blood in a Ljubljana art gal- 
lery. She walked around the donor animal 
on prosthetic hooves; then samples of her 
hybrid blood were freeze-dried and placed 
in engraved aluminium cases. In 2005, a New 
York gallery showed a starburst of glass orbs 
and aluminium rods depicting the explosion 
of space after the Big Bang, by sculptor Josiah 
McElheny and cosmologist David Weinberg. 
Such are the collaborations chronicled 
by historian Arthur I. Miller in Colliding 
Worlds. Miller argues that we are seeing the 
emergence of a “third culture” — a term 
coined by writer John Brockman — in which 
boundaries between art and science dissolve. 
The past decade has seen a proliferation 
of galleries, labs and residency programmes 
devoted to mingling art and science. Miller 
surveys these, from London's Wellcome Col- 
lection to the Ars Electronica Futurelab in 
Linz, Austria; the Science Gallery at Trinity 
College Dublin (of which I was founding 
director); Le Laboratoire in Paris; and the 
Collide@CERN artist-residency programme 
at Europe's particle-physics lab near Geneva, 
Switzerland. He provides engaging pen por- 
traits of many of the artists involved, such as 
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Evelina Domnitch and Dmitry Gelfand, who 
experiment with sonoluminescence. 

Miller touches on early examples of cross- 
pollination, such as physicist Niels Bohr’s 
interest in Cubism, but locates the origins 
of the modern art-science movement in 
1966, with 9 Evenings: Theater and Engineer- 
ing. These experimental “happenings” had 
proto-pop-artist Robert Rauschenberg and 
avant-garde composer John Cage as key par- 
ticipants, in the presence of Andy Warhol, 
Marcel Duchamp and other New York art 
luminaries. Unfortunately, technical disas- 
ters and delays led to negative press coverage. 

The unlikely hero 
of Miller’s story is Billy 
Kliiver of Bell Labs in 
Murray Hill, New Jer- 
sey, the instigator of 
9 Evenings and a gifted 
engineer. Kliiver’s 
earlier collaboration 
with Swiss artist Jean 
Tinguely — on Hom- 
age to New York, a self- 


Colliding Worlds: 


destructive kinetic How Cutting- 
artwork made from [8° Science 

: is Redefining 
bicycle and pram parts Contemporary Art 


— led to meetings with 
Rauschenberg and 


ARTHUR I. MILLER 
W. W. Norton: 2014. 
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other high-profile artists. Kliiver persuaded 
his Bell Labs colleagues that artists would 
stimulate new directions in technology. As 
the philosopher Marshall McLuhan put it in 
his 1964 book Understanding Media, art can 
be considered “precise advance knowledge of 
how to cope with the psychic and social con- 
sequences of the next technology”. 

The twenty-first-century explosion of art- 
and-science programmes and spaces has been 
fostered partly by significant investment from 
funding organizations. Miller documents the 
tension here between art’s roles in illustrating, 
communicating and interrogating science. 

Some artists, including Antony Gormley 
or McElheny, draw on areas such as foam 
physics or cosmology as aesthetic inspiration. 
Others provoke critical discussion around 
future directions of research, such as Aus- 
tralian performance artist Stelarc, who had an 
ear-shaped scaffold implanted into his fore- 
arm. Discussing the rationale of the Collide@ 
CERN residency, which has featured sound 
sculptor Bill Fontana and photographer and 
media artist Julius von Bismarck, CERN 
director Rolf-Dieter Heuer tells Miller that 
revealing what scientists are doing for society 
is key. “To transmit that through art ... opens 
horizons,” says Heuer. It is a suggestion that 
would sit uneasily with many of the critically 
engaged artists whom Miller discusses. 

Experimental art-science collaborations 
have not always been embraced by conven- 
tional galleries and collectors. Peter Weibel, 
founding director of the Centre for Art and 
Media (ZKM) in Karlsruhe, Germany, tells 
Miller that this should not be a concern 
because “private industry will finance” art- 
ists, liberating them from the vagaries of the 
market. Indeed, Colliding Worlds opens with 
the heady atmosphere of Bell Labs halfa cen- 
tury ago; towards the end, it considers digital 
artists Scott Draves and Aaron Koblin, who 
both work for Google. 

Unlike other surveys, such as artist Stephen 
Wilson's Art + Science Now (Thames and 
Hudson, 2010), Colliding Worlds features 
interviews with the artists, scientists and engi- 
neers involved in projects from speculative 
design to data visualizations, sound art and 
cosmetic surgery. Such tales enliven the book. 
But it is hard to accept “artsci’, as Miller terms 
it, as acoherent movement. The third culture, 
he shows, consists instead of exciting, experi- 
mental and mutually enriching collisions. 

Ultimately, Miller suggests, such collisions 
— once in the mainstream — become just 
‘art. The important question that remains is 
whether such art can alter the direction of 
scientific research, beyond provoking public 
discussion and debate. = 


Michael John Gorman is chief executive of 
Science Gallery International in Dublin. 
e-mail: michaeljohn.gorman@ 
sciencegallery.com 
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Jo Robinson with vegetables from her demonstration garden in Washington state. 


Q&A Jo Robinson 


The nutrient hunter 


Investigative food journalist Jo Robinson has spent more than a decade scouring the literature 
on plant nutrition. Her demonstration garden in Washington state opens this month as her book 
Eating on the Wild Side (Little, Brown, 2013) emerges in paperback. She talks about eating 
tomatoes to protect from sunburn, why bitter is better — and how purple is the new green. 


What is the thesis of Eating on the Wild Side? 
All plants make phytochemicals, which 
protect against predation, disease and other 
threats. When we consume certain plants, 
we may also receive some protection — I 
give evidence from the scientific literature 
in my book. Lycopene in tomatoes guards 
against ultraviolet light, and has been shown 
to protect us from sunburn, for example. 
A relatively new discovery is that since the 
invention of farming, we have been breeding 
varieties with progressively fewer beneficial 
phytochemicals, partly because many taste 
bitter or astringent. Fruit and vegetables with 
fewer of these compounds may offer less 
protection against hypertension, cholesterol, 
inflammation and other ills. Part of my work 
is to identify heirloom and modern varieties, 
such as Purple Peruvian potatoes, that are rich 
in phytochemicals and pleasing to the palate. 


Which fruit and vegetables should we eat? 

Purple, blue, red or black plants such 
as most berry varieties and red cabbage 
are good choices because they contain a 


family of pigments known as anthocyanins. 
Test-tube, animal and now a few small-scale 
human studies show that anthocyanins have 
the potential to curb the risk of cardiovas- 
cular disease by reducing inflammation, 
improving blood lipids and lowering blood 
pressure. A pilot study determined that 
anthocyanin-rich berries slowed the growth 
of cancer cells in people with colon cancer. 
Evidence is mounting that anthocyanins 
may also slow the decline of cognition and 
memory that accompanies old age. 


Is colour the only indicator of such effects? 

No: most phytochemicals are not highly 
pigmented. The drab globe artichoke has 
more antioxidant activity than more brightly 
coloured vegetables because of its high con- 
centration of colourless cynarin (which 
increases bile secretion and may protect the 
liver from carcinogens) and chlorogenic 
acid, which has antihypertensive effects. 
White onions, leeks and shallots contain an 
anticancer and flu-fighting compound called 
quercetin. Some varieties of white-fleshed 
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peaches have more antioxidant activity than 
yellow-fleshed peaches, even though the 
yellow varieties have more of the pigmented 
phytochemical B-carotene. 


What is the best way to preserve the 
nutrients in stored vegetables? 

Once a plant is harvested it does not die 
immediately. The harvested part is still 
metabolically active and begins to burn its 
natural sugars and lose phytochemicals and 
flavour. You can slow this process by reducing 
its exposure to oxygen, storing it in the fridge 
ina sealed plastic bag with 10-20 pinpricks. 


What are the best ways to cook vegetables? 

I cringe when I see people boiling vegetables, 
because the cells burst and nutrients leach 
out into the water. Lightly sautéing in oil 
is fine, but steaming is almost always best 
because it reduces exposure to water. If you 
microwave an ear of corn in the husk, you 
preserve nutrients and taste. Microwaving 
is also best for thawing berries, because it 
destroys an enzyme called polyphenol oxi- 
dase that breaks down antioxidants. 


Could biotechnology help us to breed 
more-nutritious plants? 

In my view we will never achieve the nutri- 
ent content of phytochemical-rich foods 
through genetic engineering. Say that we 
find a gene that produces cabbage with more 
cancer-fighting glucosinolates. This family of 
health-enhancing compounds is only one out 
of dozens in the vegetable, and, ultimately, it 
may not prove to be the most beneficial. But 
there is great promise in crossing wild spe- 
cies with modern ones through conventional 
breeding, which introduces myriad genes. In 
my garden I grow hybrid blackberries called 
Wild Treasure that are thorn-free and highly 
productive, but retain the nutrition and lus- 
cious flavour of wild berries. 


Why open a demonstration garden? 

I want to show people that by growing their 
own food they can choose varieties that 
will increase their odds of living longer and 
healthier lives. In my own garden there is a 
wild crab apple from Nepal that has more 
antioxidants in a single teaspoon than a 
large Honeycrisp apple. There are Indigo 
Rose tomatoes, an inky black variety rich in 
anthocyanins. And there are purple varie- 
ties of carrots, cauliflower and asparagus. 
You could say that purple is the new green. 


What is next for you? 

I may write a cookbook about findings on 
how to preserve and enhance plant phyto- 
chemical content. I also have enough data to 
write a book about nutrient-rich beverages 
such as tea, wine, coffee, whisky and beer. m 
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Correspondence 


Anearly champion 
of women’s rights 


In his 1859 book On the Origin of 
Species, Charles Darwin argues 
that “all animals and plants 

have descended from some one 
prototype”. In none of the book's 
six editions does he refer to this 
common ancestor as being an 
animal-like hermaphrodite 

with male and female gonads, 

as Kimberly Hamlin suggests 

in her book on Darwinian 
feminism, From Eve to Evolution 
(reviewed in Nature 509, 424; 
2014). Hamlin writes, for 
example, that “the possibility of a 
hermaphroditic past ... opened 
up a new world of gendered 
possibilities” 

It was the co-discoverer of 
natural selection, Alfred Russel 
Wallace, who was a public 
advocate of womens rights. 

As reported in The Times on 

11 February 1909, he wrote: “All 
the human inhabitants of any 
one country should have equal 
rights and liberties before the 
law; women are human beings; 
therefore they should have votes 
as well as men.” 

U. Kutschera Institute of Biology, 
University of Kassel, Germany. 
kut@uni-kassel.de 


Synthetic biology: 
missing the point 


Volker ter Meulen warns that 

if environmental groups and 
others exaggerate the risks 

of synthetic biology it could 
promote over-regulation, which 
he says happened for genetically 
modified organisms (Nature 
509, 135; 2014). But the point 
of supporting synthetic biology 
is not about making sure that 
science can go wherever it wants: 
itis about making the type of 
society people want to live in. 

In the United States, for 
example, the rapid and uncritical 
introduction of genetically 
modified organisms prevented 
debate on issues such as 
alternative innovation pathways, 
and the impact on biodiversity 


and pest resistance. Many believe 
that these issues would have been 
better addressed through earlier 
and broader public discussion of 
the uncertainties surrounding 
transgenic organisms (see 

S. Jasanoff Designs on Nature 
Princeton Univ. Press; 2005). 

In our view, ter Meulen 
trivializes the role of social 
scientists in suggesting that they 
could help the synthetic-biology 
debate by finding better ways 
to communicate what scientists 
think. He also implies that public 
concern over such technologies 
and their governance reflects only 
a failure to understand the science 
of risk assessment — but this 
‘deficit mode? of public concerns 
has long been discredited 
(see A. Irwin and B. Wynne 
Misunderstanding Science? 
Cambridge Univ. Press; 1996). 

It is not unknown for scientists 
themselves to foster exaggeration 
and uncritical acceptance of 
claims, or to focus on anticipated 
benefits rather than on risks. 
This practice may be at the heart 
of wider public concerns about 
responsible innovation (see, 
for instance, go.nature.com/ 
zehpdp). 

Sam Weiss Evans* University of 
California, Berkeley, USA. 
samuel.evans@berkeley.edu 

*On behalf of 21 correspondents 
(see go.nature.com/romzbu for 


fullilist). 


Synthetic biology: 
a global approach 


Despite some success in 
advancing best practices for 
synthetic biology in ethics, safety, 
security and the environment, the 
conversation about a global “tribal 
gathering” is only just beginning 
(see Nature 509, 133; 2014). 

In 2006, when the field 
was starting to appreciate the 
concepts and conditions for 
success, the Synthetic Biology 
Engineering Center (Synberc; 
www.synberc.org) was founded 
with support from the US 
National Science Foundation. It 
consisted of 20 investigators who 
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helped to lay the foundations for 
synthetic biology at a time when 
tribalism probably still did us 
some good. 

Synberc is now a singularly 
diverse tribe. It aims to examine 
the broad social context of 
biotechnology research practice 
through programmes that involve 
political scientists, legal scholars, 
ethicists, theologians, industrial 
practitioners, anthropologists 
and others, along with its own 
scientific advisory board. 

We call for global expansion 
of the Synberc model into a 
more inclusive organization 
that is committed to advancing 
responsible scientific and 
social progress in synthetic 
biology. The main US funding 
agencies and their counterparts 
worldwide need to join with 
academics, industrial partners 
and society to support this 
long-term, internationally 
coordinated effort (see also 
V. ter Meulen Nature 509, 

135; 2014). 

Jay D. Keasling* University of 
California, Berkeley, USA. 
keasling@berkeley.edu 

*On behalf of 16 correspondents 
(see go.nature.com/bp83hq for 
full list). 


Successes for UK 
cancer partnership 


Your examples of important 
hybrid academic-industrial 
partnerships for drug 
development come mostly from 
the United States (Nature 509, 
146; 2014). The Institute of 
Cancer Research in London has 
long benefited from such hybrid 
models. When normalized 
for each faculty member, 
our income from intellectual 
property is highest among UK 
universities and ranked in the 
top ten relative to US institutions 
(see go.nature.com/ohyuq)). 
Since 2005 we have discovered 
17 drug candidates — in several 
cases with our industry partners 
—and 7 of these have progressed 
to phase I/II clinical trials. Our 
drug abiraterone was approved 
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in the United States and Europe 
in 2011, and has changed clinical 
practice for treating advanced 
prostate cancer (see J. S. de Bono 
etal. N. Engl. J. Med. 364, 1995- 
2005; 2011). Other examples 
include drugs that target 
breast, lung and other cancers 
by inhibiting proteins such as 
phosphatidylinositol-3-OH 
kinase (EF I. Raynaud et al. Mol. 
Cancer Ther. 8, 1725-1738; 2009) 
and the molecular chaperone 
HSP90 (S. A. Eccles et al. Cancer 
Res. 68, 2850-2860; 2008). 
These successes are a result 
of taking early academic risks, 
combining academic and 
pharmaceutical expertise, and 
implementing strong leadership 
and project management. Other 
contributing factors include 
running multiple projects ona 
competitive scale, establishing 
long-term financial support and 
— most important — selecting 
productive and timely industrial 
collaborations. 
Paul Workman The Institute of 
Cancer Research, London, UK. 
paul.workman @icr.ac.uk 


Forgotten founder 
of bibliometrics 


Besides being one of the 
conceptual inventors of the 
Internet (P. Ball Nature 509, 

425; 2014), the Belgian librarian 
Paul Otlet first coined the term 
‘bibliometrics. In his book Traité 
de Documentation (1934), he 
called for the foundation ofa 
new field, bibliométrie, which 

he defined as the measurement 
of all aspects related to the 
publication and reading of books 
and documents. 

As an example, Otlet suggested 
recording how often a particular 
book or author is read. He noted 
that mathematics was becoming 
increasingly important in most 
scientific fields, including in 
biology and sociology, and felt 
that it should be included in 
library science as well. 

Ronald Rousseau KU Leuven, 
Belgium. 
ronald.rousseau@kuleuven. be 
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Diamond gets harder 


Composite materials that incorporate diamond are among the hardest in the world, but fail under extreme conditions. 
Ananostructured form of diamond, made from onion-like carbon precursors, might overcome this problem. SEE LETTER P.250 


JAMES BOLAND 


iamond is a famously strong 
D material with outstanding 

properties, such as high 
wear resistance and hardness. For 
this reason, it has long been used 
in cutting and drilling tools, but 
poor thermal stability has limited 
its application. On page 250 of 
this issue’, Huang et al. report the 
synthesis of ‘nanotwinned’ dia- 
mond, in which nanometre-scale 
crystals share some lattice points. 
The authors find that the resulting 
material is much harder and more 
thermally stable than naturally 
occurring diamond. 

The ancient Egyptians may have 
been the first to use diamonds in 
tooling, although the evidence for 
this is unsubstantiated. But rock 
drilling with diamonds has been 
more reliably dated to the eight- 
eenth century’. The need for high- 
strength, hard-wearing drill bits for 
industrial drilling and oil explora- 
tion led to the development of a new class of 
superhard material in the 1980s consisting of 
diamond grains bonded with metallic cobalt. 
The main disadvantage of these materials 
is that the cobalt catalyses the breakdown of 
diamond to graphite at temperatures above 
700°C. A diamond composite was developed 
around the same time’, in which the cobalt 
binder was replaced by a ceramic material, sili- 
con carbide, and was shown to be stable under 
harsh and severely abrasive rock-cutting con- 
ditions to temperatures in excess of 1,200°C. 
However, this thermally stable diamond com- 
posite material has yet to be widely adopted 
as a cutting element in tools for the mining, 
drilling and manufacturing industries for 
reasons of cost. 

A major drawback of diamond-based com- 
posites has been their low fracture toughness 
(a measure of resistance to crack propagation), 
which can cause them to fail catastrophically. 
Harder diamond composites, which have 
higher concentrations of diamond, have lower 
fracture toughness. Nevertheless, these materi- 
als have high wear resistance, and have formed 
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Figure 1 | Computer model of an onion carbon nanoparticle. Huang 
et al.’ used such nanoparticles to make an ultrahard, nanostructured form 
of diamond. (Image taken from ref. 1.) 


the basis of long-lasting tools for industrial use, 
provided that the mechanical loading on them 
is controlled. 

Hardness is not governed by composition 
alone; the grain size of the constituent phases 
of the materials is also a factor. For hard and 
brittle materials such as diamond composites, 
hardness and strength increase with decreas- 
ing grain size, as expressed by the Hall—Petch 
relationship**. Normally, such improved 
hardness is accompanied by a decrease in 
fracture toughness; this inverse relationship 
was a generally accepted model until nano- 
structured materials were thoroughly investi- 
gated for their mechanical properties. In such 
materials, the inverse relationship no longer 
holds when the grain size is less than about 
100 nanometres, and fracture toughness can 
actually increase with decreasing grain size’. 
These materials, including diamond com- 
posites with constituents that have nanoscale 
grains, have been shown to have outstanding 
fracture toughness. 

Grain-size-reduction techniques for 
improving the fracture toughness of ultra-hard 
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materials have proved useful, but 
seem to have been limited by either 
the material involved or the tech- 
nology used. Further improve- 
ments to such materials therefore 
seemed unlikely, unless alloyed 
nanograined materials with even 
higher intrinsic hardness could be 
discovered. However, Huang and 
colleagues’ demonstrate that a fur- 
ther reduction in the crucial hard- 
ness-related length scale of grains is 
achievable. 

Researchers from the same group 
had previously reported’ a process 
for making a nanotwinned form of 
boron nitride — a material with a 
diamond-like atomic arrangement. 
They therefore decided to mimic 
that process with diamond, by sub- 
jecting carbon nanoparticles consist- 
ing of concentric graphite-like shells 
(known as onion carbon nanoparti- 
cles; Fig. 1) to pressures in the range 
of 18-25 gigapascals at temperatures 
of 1,850-2,000°C. The resulting 
transparent material consisted of 
nanotwinned, nanocrystalline diamond. 

The hardness of Huang and co-workers’ 
material reached about 200 GPa; for compari- 
son, hardness values for single-crystal dia- 
monds range from 60 to 130 GPa, and those of 
nanocrystalline diamonds without nanotwins 
are 130-145 GPa (ref. 8). Another outstand- 
ing property is its high fracture toughness, 
which is greater than that of other commer- 
cially available diamond composite materials. 
Remarkably, the nanotwinned diamond was 
stable against oxidation in air at temperatures 
above 1,000 °C — higher than the authors 
expected. 

Huang et al. prepared millimetre-sized 
pieces of their material on a laboratory scale, 
but it remains to be seen whether their process 
can be used on an industrial scale. Success will 
depend in part on whether starting materi- 
als of sufficiently high quality can be made. 
Nanocrystalline diamond has previously 
been sintered — fused at high temperature 
and/or pressure — to manufacture anvils that 
are used for high-pressure, high-temperature 
phase studies of geological materials*, and 


similar scientific applications could be pre- 
dicted for nanotwinned diamond. However, 
the material's creep (the tendency of a material 
to deform permanently in response to long- 
term mechanical stresses) and fatigue proper- 
ties need to be measured. If the deformation 
mechanism changes from one that is based on 
crystallographic defects to one based on slid- 
ing at grain boundaries, as commonly occurs 
during ‘superplastic’ deformation when solid 
materials are heated, then methods for pinning 
grain boundaries would be required’. 

Nanodiamonds have progressed over the 
past decade or so from being speculative curi- 
osities to fully functioning materials useful 
for a broad range of applications. Individual 
nanoparticles consisting of only a few hun- 
dred carbon atoms arranged into the diamond 
structure are being used in such diverse areas 
as drug delivery, bioimaging and tissue gen- 
eration'®. Nanodiamonds, either aggregated 
or disaggregated in lubrication fluids, can also 
form low-friction interfaces that reduce wear 
on moving components at both the macro- and 
microscale”. 

Equally important is the innovative and 
rapidly developing research on the consolida- 
tion and sintering of nanodiamonds to make 
solid composite materials that have a wide 
range of remarkable properties, such as high 
thermal conductivity, optical transparency, 
chemical inertness and high tolerance to radia- 
tion damage. These composites were initially 
produced on scales barely higher than that of 
the nanoparticles themselves, but extraordi- 
nary progress in high-pressure, high-tempera- 
ture technology* now means that the materials 
can be produced at sizes that have applications 
across several industries. The incorporation of 
nanotwinned, nanocrystalline diamonds into 
composites might lead to materials that have 
even more extraordinary properties. = 


James Boland is in the Division of 

Earth Science and Resource Engineering, 
Commonwealth Scientific and Industrial 
Research Organisation (CSIRO), Pullenvale, 
Queensland 4069, Australia. 

e-mail: jim. boland@csiro.au 
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Pass the ammunition 


Tomato plants that have been damaged by herbivorous insects emit airborne 
chemicals that warn neighbours of an impending attack. It emerges that the 
receiving plants transform these signals into defensive weapons. 


MARK C. MESCHER & 
CONSUELO M. DE MORAES 


respond in complex ways to diverse 

features of their environment. It is 
becoming increasingly clear, for example, 
that plants perceive and respond to environ- 
mental odours. However, almost nothing is 
known about the mechanisms by which plant 
olfaction occurs. Writing in Proceedings of 
the National Academy of Sciences, Sugimoto 
et al.’ report that when plants are exposed to 
odours emitted by neighbours that have been 
damaged by herbivorous insects, they react by 
transforming compounds in the odour into 
effective anti-herbivore defences. 

When insects feed on plant tissues, the 
assaulted plant can exhibit a range of physio- 
logical responses. For example, it may produce 
chemical toxins and feeding deterrents, or 
emit airborne volatile compounds that attract 
natural enemies of the feeding herbivores, 
such as insect predators and parasitoids 
(for instance, parasitic wasps, which lay their 
eggs in plant-feeding caterpillars)’. It has 
become widely accepted** that plants can 
also use volatile emissions released by dam- 
aged neighbours as cues to prepare their own 


Pp lants may seem passive, but in fact they 


Assaulted plant 


defences against an impending attack — an 
idea that had previously been controversial**”. 

Sugimoto et al. investigated the mechanisms 
by which volatile signalling between culti- 
vated tomato plants influences their defence 
against larvae of the moth Spodoptera litura, 
an agricultural pest also known as the com- 
mon cutworm. Using an experimental set-up 
in which airflow between individual plants was 
carefully controlled, the authors showed that 
exposure to the odours released by cutworm- 
damaged tomato plants significantly enhanced 
the ability of neighbouring plants to resist a 
subsequent attack. Cutworm larvae placed on 
plants that had been exposed for three days to 
the odours released by damaged plants showed 
both reduced growth and increased mortality 
compared with larvae placed on unexposed 
control plants. 

The authors’ extensive biochemical analy- 
ses of tomato-leaf tissues revealed that plants 
exposed to volatiles that had been released 
by damaged neighbours had greatly elevated 
levels ofa single compound — (Z)-3-hexenyl- 
vicianoside, or HexVic. Furthermore, they 
found that cutworm larvae fed an artificial 
diet laced with HexVic showed significantly 
reduced growth compared with larvae 
reared on an untainted diet, confirming that 


Unassaulted neighbour 


Figure 1 | Plant odours as alarm signals. When plants are attacked by herbivorous insects, they release 
volatile compounds that can attract insect predators and deter further herbivory. These volatile odours also 
enhance the defences of neighbouring plants against attack. Sugimoto et al.’ report that tomato plants can 
directly transform (Z)-3-hexenol, a volatile compound released by their damaged neighbours, into 
(Z)-3-hexenylvicianoside (HexVic), an effective defence compound that reduces the growth and survival 
of the herbivores. (Figure adapted from drawings by Nick Sloff and Thomas Degen.) 
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accumulation of HexVic is involved in the 
tomato plants’ increased resistance. Sugimoto 
and colleagues were struck by the remark- 
able chemical similarity between HexVic and 
(Z)-3-hexenol, a component of the odour 
emitted by damaged tomato plants. This sug- 
gested that the transmission of (Z)-3-hexenol 
from damaged to undamaged plants might 
provide the raw material for HexVic produc- 
tion, rather than that the volatile was acting 
simply as a chemical signal (Fig. 1). 

To test this, the authors exposed tomato 
plants to airborne (Z)-3-hexenol tagged with 
the hydrogen isotope deuterium, and found 
that all of the HexVic subsequently produced 
was labelled with the isotope. Furthermore, 
they found no indication that exposure to vol- 
atiles caused the plants to produce any extra 
(Z)-3-hexenol. It seems, therefore, that the 
accumulation of HexVic in undamaged tomato 
plants is entirely dependent on the uptake of 
(Z)-3-hexenol from the atmosphere. However, 
once plants are attacked by herbivores, their 
own (Z)-3-hexenol production seems to drive 
further accumulation of HexVic. 

The remarkable mechanism of defence 
induction documented by Sugimoto et al. 
may be widespread in plants. (Z)-3-Hexenol 
is often found in ‘green-leaf’ volatiles, which 
most plants emit in the immediate aftermath of 
tissue damage (this compound contributes 
to the characteristic odour of freshly cut 
grass), and the biochemical transforma- 
tion of (Z)-3-hexenol to HexVic is relatively 
simple and probably quite common. Indeed, 
the authors present evidence that HexVic, 
or chemically similar compounds, accumu- 
late in many plant species after exposure to 
green-leaf volatiles. 

This mechanism may also illuminate a 
plausible scenario for the evolution of plant- 
to-plant communication. From an evolution- 
ary perspective, it is difficult to understand 
the benefit of signals that send reliable warn- 
ings of impending herbivory to neighbouring 
plants that will frequently be competitors of 
the emitter. Consequently, previous work** 
has suggested that plant-to-plant signal- 
ling might emerge as a by-product of vola- 
tile-signalling systems in individual plants. 
Consistent with this hypothesis, it is easy 
to envisage the mechanisms described by 
Sugimoto and colleagues as having initially 
evolved to regulate the accumulation of plant- 
defence compounds in distant parts of the 
emitting plant. 

Even if neighbouring plants are not the 
intended receivers of green-leaf volatiles, 
they probably benefit from this intrinsic 
ability to produce HexVic from atmospheric 
(Z)-3-hexenol, because the levels of this 
compound in the air are likely to reflect the 
intensity of local herbivore activity. This effect 
might well have broader ecological implica- 
tions, because an increase in volatile-induced 
resistance in local plant communities that 
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grows in proportion to the size and intensity 
of emerging insect infestations could act as a 
brake on the spread of the infestation. 

Despite the far-reaching implications of 
this study, it is unlikely to be the last word 
on plant-to-plant signalling or plant olfac- 
tion. For example, Sugimoto et al. found no 
link between the mechanisms they observed 
and the expression of jasmonic acid (a plant 
hormone that regulates many defence mecha- 
nisms activated by herbivory’), although prim- 
ing of jasmonic-acid-mediated defences by 
volatiles has been reported in different species. 
Moreover, other classes of plant compounds 
have been implicated in plant-plant interac- 
tions*"®, and a study last year" also reported 
plant-defence responses to insect odours that 
bore little or no resemblance to the typical 
emissions of plants. 

It is unclear whether the mechanisms 
underlying other plant responses to environ- 
mental odours will prove as straightforward 
as the uptake and subsequent conversion of 
(Z)-3-hexenol to HexVic in tomato plants, or 
to what extent they parallel the far greater com- 
plexity observed in animal olfactory systems”. 
But itis certain that many more fascinating and 
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unexpected facets of plant olfaction remain to 
be discovered. m 
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When lymphocytes 
run out of steam 


The finding that absence of the enzyme CTPS1 underlies a form of human 
immunodeficiency highlights the role of metabolism in immune responses 
and suggests avenues for treating diseases such as leukaemia. SEE LETTER P.288 


ANDRE VEILLETTE & DOMINIQUE DAVIDSON 


crucial part in protection against micro- 
organisms such as viruses and bacteria. 
The two main types of lymphocyte are T cells 
and B cells, which, in the presence of antigen 
molecules derived from microbes, undergo a 
series of molecular changes that induce a state 
of activation. This response is driven by anti- 
gen receptors on the cells’ surface, and leads 
to rapid cell proliferation and augmented 
immune protection (Fig. la). Proliferation 
under these conditions depends on metabolic 
adaptation, which allows immune cells to 
synthesize DNA and RNA molecules and the 
proteins needed for cell division’. In this issue, 
Martin et al.” (page 288) link this metabolic 
requirement of proliferating lymphocytes to a 
newly described immune-deficiency disease. 
The authors describe children from sev- 
eral unrelated families who developed a 
severe immunodeficiency at birth or at a very 
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young age. In some cases, two members of the 
same family had the deficiency. Typically, the 
patients exhibited severe and persistent infec- 
tions with viruses such as Epstein-Barr, the 
cause of infectious mononucleosis, and vari- 
cella zoster, which causes chickenpox and her- 
pes zoster (shingles). Severe infections from 
bacteria such as pneumococcus, a cause of 
pneumonia, were also noted. Several children 
underwent transplantation with haemato- 
poietic stem cells (which can differentiate into 
all types of blood cell) to control the infections. 
Immunological investigations led the authors 
to propose that the patients might be suffer- 
ing from an inherited immunodeficiency that 
compromises lymphocyte function. 
Sequencing of DNA from the affected 
children revealed that they all carried a muta- 
tion in CTPS1, the gene encoding the enzyme 
cytidine nucleotide triphosphate synthase 1 
(CTPS1), that resulted in an absence of this 
enzyme in the patients’ lymphocytes. CTPS1 
is one of two forms of CTP synthase enzymes 
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Figure 1 | Regulation of lymphocyte activation by CTPS1. a, In normal lymphocytes (T and B cells), 
stimulation of the cells’ antigen receptor triggers a series of molecular changes that induce the cells to 
proliferate, fuelling the immune response. Martin et al.’ show that these events include an increase in 
levels of the enzyme CTPS1 and its product, CTP, which supports the increased DNA synthesis required 
for cell proliferation. b, In lymphocytes from CTPS1-deficient humans, stimulation by antigen results 
in some, but not all, of the molecular changes associated with lymphocyte activation. In particular, there 
is no increase in CTP levels in the activated cells, resulting in compromised DNA synthesis, reduced 
lymphocyte proliferation and an impaired immune response. 


produced in mammalian cells (the other 
being CTPS2); both enzymes enable the 
production of cytidine nucleotide triphosphate 
(CTP), a nucleotide required for cellular DNA 
and RNA synthesis’. 

The authors show that normal lymphocytes 
express both CTPS1 and CTPS2: CTPS1 is 
present at low levels before lymphocyte activa- 
tion and becomes markedly expressed in acti- 
vated lymphocytes, whereas CTPS2 is already 
expressed at high levels in non-activated 
lymphocytes. Analyses of T and B cells from the 
CTPS1-deficient patients revealed that differen- 
tiation of the cells in the absence of activation by 
foreign microorganisms was largely unaffected 
by the mutation, and the immediate molecular 
changes triggered by antigen-receptor stimula- 
tion were mostly unaltered. However, the cells’ 
capacity to synthesize DNA and proliferate 
following stimulation of the antigen recep- 
tor was severely compromised. Intracellular 
levels of CTP were also very low. These defects 
were reproduced when CTPS1 expression was 
artificially reduced in normal lymphocytes, 
or when 3-deazauridine, a pharmacological 
inhibitor of CTPS enzymes, was used to sup- 
press their activity. Conversely, the defects 
were corrected when CTPS1 was introduced 
into cells of CTPS1-deficient patients by retro- 
virus-mediated gene transfer, or when CTP was 
added to the cells’ culture medium. 

These findings show that CTPS1 and its 
product, CTP, are required for lymphocytes 
to proliferate intensely during antigen- 
induced activation, further highlighting the 


importance of rapid metabolic adaptation for 
proper immunity. In the absence of CTPS1, 
antigen-stimulated lymphocytes do not pro- 
duce sufficient quantities of CTP, causing 
defects in DNA synthesis and cell prolifera- 
tion (Fig. 1b). These effects explain in large 
part why CTPS1-deficient children develop 
life-threatening viral and bacterial infections. 

In addition to identifying the genetic cause 
of a new immunodeficiency, Martin and col- 
leagues’ results raise several prospects for 
future investigation. They indicate that even 
though CTPS2 is expressed in lymphocytes, 
it cannot replace CTPS1 when the latter is 
deficient. One possible explanation for this 
is that CTPS1 is much more active than 
CTPS2 in lymphocytes, possibly owing to dif- 
ferences in intrinsic activity or abundance, 
or to modifications such as phosphorylation 
or co-factor binding that could influence the 
enzymes activity. For instance, in mammalian 
cells, CTPS1, but not CTPS2, can be regulated 
by phosphorylation on certain amino-acid 
residues”. Alternatively, because it has been 
reported that CTPS1 and CTPS2 can be local- 
ized in both the cytoplasm and the nucleus’, it 
is possible that CTPS1 accumulates in a locale 
that is especially important for generating 
the CTP needed for intense DNA synthesis. 
Moreover, these enzymes form tetramers and 
can exist as large filamentous structures”; 
whether differences in these arrangements 
exist between CTPS1 and CTPS2 remains to 
be clarified. 

Most studies of CTPS enzymes have focused 
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on their capacity to promote DNA and RNA 
synthesis. However, investigations of the two 
yeast homologues of CTPS enzymes, URA7 
and URAS, indicated that these enzymes are 
also needed for the synthesis of phospho- 
lipids’. It is possible that the use of CTP 
during the synthesis of membrane phospho- 
lipids is needed for the interaction of signal- 
ling molecules with the inner leaflet of the 
cell membrane. If such an activity exists for 
mammalian CTPS enzymes, defects in mem- 
brane-driven signalling could contribute to 
the lymphocyte dysfunctions observed in 
CTPS1-deficient patients. This activity could 
also explain Martin and colleagues’ obser- 
vation that CTPS1-deficient human T cells 
have reduced activation of the enzyme Erk 
kinase and reduced expression of the signal- 
transmission proteins CD25 and CD69 fol- 
lowing antigen-receptor stimulation. These 
molecular events occur at early stages in cell 
activation, before the initiation of DNA syn- 
thesis, and are crucial for productive lympho- 
cyte activation. 

The data also raise the provocative possibility 
that pharmacological inhibitors of CTPS1 
could be useful tools for treating human 
diseases associated with excessive or uncon- 
trolled lymphocyte proliferation, such 
as transplant rejection, graft-versus-host 
disease and some forms of cancers such as 
leukaemia and lymphoma. In keeping with the 
latter idea, the CTPS inhibitor 3-deazauridine 
has already been shown to display some 
therapeutic efficacy against leukaemic cells 
in vitro, although it probably also inhibited 
targets other than CTPS in these cells'’. The 
development of more-specific inhibitors of 
CTPS1 will aid the further investigation of this 
possible therapeutic avenue. = 
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PARTICLE PHYSICS 


The hunt for Majorana 
neutrinos hots up 


Finding that neutrinos are their own antiparticles would revolutionize particle 
physics. A high-sensitivity technique accelerates the search for the nuclear-decay 
process that would enable such a discovery. SEE ARTICLE P.229 


DAVID WARK 


iscovering a new class of fundamental 
D particle is about the biggest bang you 

can make in particle physics. The 
discovery of the Higgs boson was so excit- 
ing partly because it is a fundamental parti- 
cle with no spin, the first ever seen. Physicists 
have long tried to resolve whether the familiar 
neutrino actually belongs to a class of exotic 
entities called Majorana particles, which are 
their own antiparticles. Majorana neutrinos 
might also help to explain why neutrinos are 
so light, and could be a clue to how the Uni- 
verse came to contain so much more matter 
than antimatter. Almost the only practical 
way to reveal Majorana neutrinos would be to 
observe the rare nuclear process called neu- 
trinoless double-B decay. On page 229 of this 
issue, the EXO-200 Collaboration’ announces 
the result of a high-sensitivity technique to 
search for this decay. Their results show the 
power of their technique, but demonstrate that 
there is still much work to do in the search for 
Majorana neutrinos. 

Double-f decay, as the name implies, occurs 
when a nucleus undergoes two f decays simul- 
taneously and so emits two B-particles (elec- 
trons or antielectrons). This is realistically 
observable only in the few nuclei for which 
single-B decay would lead to a heavier daugh- 
ter nucleus but two B decays would lead to a 
lighter one. The standard model of particle 
physics allows two-neutrino double-f decay 
(2vBB decay), which is just two ordinary 
6 decays occurring at the same time, lead- 
ing to the emission of two electrons and two 
antineutrinos. But what if, as many theorists 
believe, neutrinos are Majorana? Then another 
type of decay could also occur, in which the 
antineutrino emitted from one of the B decays 
could be absorbed as a neutrino in the other 
6 decay, and the process seen from outside the 
nucleus would just be the emission of the two 
electrons, or zero-neutrino double-B decay 
(OvBB decay). 

OvBB decay has a beautiful experimental 
signature — the simultaneous emission 
of two electrons with a total energy that 
sums to the difference in mass between the 
parent and daughter nuclei. Furthermore, 
it can happen only if neutrinos have mass. 
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Owing to the nature of the weak interactions 
that cause B decay, the emitted antineutrino 
will have a spin that is right-handed with 
respect to its direction of motion, and the 
absorbed neutrino must be left-handed. This 
coexistence of the different spin states happens 
only for particles that have mass, and there- 
fore makes the observed rate of OvBB decay a 
sensitive probe of the mass of the neutrino. 
OvBf decay is therefore a ‘two for the price of 
one’ measurement, which is one reason that 
so many groups are attacking it with different 
techniques and different nuclei. 

The EXO (Enriched Xenon Observatory) - 
200 team has now moved near to the front of 
the pack with its liquid-xenon ‘time-projection 
chamber detector (Fig. 1). The active part of 
the detector is 110 kilograms of liquid xenon 
(which acts both as the double-B-decay source 
and the detector) enriched to 80.6% in the 
potentially double--decaying nucleus xenon- 
136 (Xe). Charged particles — either the 
electrons emitted in double-8 decay or, much 
less desirably, background events from con- 
taminant nuclei in the detector or surround- 
ings — produce ionization in the xenon, and 
are drifted in an applied electric field to two 


Figure 1 | Chasing Majorana neutrinos. The EXO-200 Collaboration’ has used a liquid-xenon 


crossed grids of wires. The grids determine the 
position of the charges in two dimensions. The 
charged particles also make the xenon emit 
scintillation light, and measuring the time dif- 
ference between that light and the arrival of 
the charge at the wires gives the drift distance 
(hence the name time-projection chamber), 
allowing reconstruction of the tracks from the 
charged particles in three dimensions. 

This reconstruction is crucial because, as 
in all double-B-decay experiments, the main 
challenge in EXO-200 is suppressing the back- 
ground events. The first step towards this was 
to minimize the background from cosmic 
rays by locating the detector deep under- 
ground in the Waste Isolation Pilot Plant site 
near Carlsbad, New Mexico. More-insidious 
background events arise from radioactive 
contaminants in and around the detector, so 
all materials that make up the detector were 
carefully selected to have the lowest possible 
levels of radioactive contamination, and the 
detector itself is heavily shielded. Background 
events still persist, however, so another advan- 
tage of the time-projection chamber comes 
to the fore. By looking for events in which 
almost all of the charge is deposited at a single 
place in the detector (as one would expect for a 
double-f decay) and rejecting those in which it 
comes from more than one place (as would be 
expected for y-rays from radioactive contami- 
nants), the background is suppressed by another 
large factor. Having these separate sets of data 
also helps to validate the properties of the detec- 
tor, which is calibrated with an extensive set of 
measurements using radioactive sources. 

The EXO-200 team has now reported the 
results of two years of running its experiment. 
The main output of the detector is a spectrum 
showing how often a particular energy was 
deposited within it. 2vBB decay produces a 


‘time-projection chamber’ detector, half of which is seen here under construction, to search for the 
neutrinoless double-f-decay process that would arise if neutrinos are Majorana-type particles. The 

two crossed grids of wires that were used to determine the position of charged particles in the detector, 
which is roughly 40 cm in diameter, lie above the frame of empty circular holes that were later filled with 


scintillation-light detectors. 
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continuous energy spectrum that dominates 
the low-energy part of the observed range 
(see Fig. 4 of the paper’), showing the success 
of the background suppression. However, no 
statistically significant sign of a concentration 
of energy deposits around the energy expected 
from 0v$f is apparent. The authors conclude 
that the half-life for OvBB decay is greater than 
1.110” years (at the 90% confidence level). 
Long lifetimes correspond to a low probability 
that the neutrino will flip spin, and so indicate 
small neutrino masses. 

These results are bad news for a previous 
claim’ of a positive signal for 0vBf in the 
decay of germanium. However, that claim 
had already been put under strain by earlier 
results’ from EXO-200, by KamLAND-Zen 
(another Xe experiment in Japan based on 
a different technique)* and by the GERDA 
experiment’ on double-f decay in germanium. 
In fact, the earlier results of EXO-200 had set 
the tighter lower limit on the 0v{f half-life of 
1.6x 107° years, and KamLAND-Zen a limit 
of 1.9 x 10” years. But in both those cases the 
limit was better than expected because the 
number of 0v$$ candidate events seen was 
actually smaller than expected on the basis of 
the background events alone. 

The new EXO-200 result, which is based on 
3.8 times more data than its first result’, actually 
has a poorer limit, because now the researchers 
see a 1.2-standard-deviation excess over back- 
ground in the region of the OvBB-decay energy 
line — that is, they see slightly more events than 
their background model would predict. Such 
an excess of events would show up almost once 
every six times just from statistical fluctuations. 
However, as we saw with the search for the Higgs 
boson, a real signal would show up initially asa 
statistically insignificant excess. The question 
is whether the excess will grow with further 
data, and for that we must wait for more results 
from all the experiments. Understanding the 
nature of neutrinos is pivotal to models of par- 
ticle physics and cosmology, so, especially now 
that experiments on neutrino oscillations® have 
shown that neutrinos have mass, the search 
must continue. m 
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Enzyme meets a 
surprise target 


An enzyme previously implicated in gene regulation has now been found to have 
arole in cancer progression, potentiating an intracellular signalling pathway that 
is driven by a mutated K-Ras protein. SEE LETTER P.283 


MARIAN M. DEUKER & MARTIN MCMAHON 


ctivating mutations in the gene KRAS, 
A= encodes a member of the Ras 

protein family, are implicated in the 
development of many human cancers’. How- 
ever, because drugs that effectively treat these 
cancers by targeting the K-Ras protein have 
proved difficult to develop, the search for 
potential therapeutic targets has turned to 
the proteins that are activated downstream of 
this oncoprotein. In this issue, Mazur et al.” 
(page 283) identify the enzyme SMYD3 as a 
protein that has an unanticipated role in the 
progression of K-Ras-driven cancers in mice. 

SMYD3, a lysine methyltransferase, is 
frequently overexpressed in human can- 
cers~’. Previous work’ has indicated that the 
enzyme primarily acts in the nucleus, adding 
methyl groups to lysine amino-acid residues 
on histones (proteins that organize DNA into 
bundles called nucleosomes). To investigate 
the role of SMYD3 in K-Ras-driven cancers, 
Mazur and colleagues used several techniques, 
spanning the gamut from genetically engi- 
neered mouse models to screens for SMYD3 
protein substrates. 

The authors found that in mouse models 
of K-Ras-driven cancer, SMYD3 acts in the 
cytoplasm of cancer cells, methylating a lysine 
residue (K260) on MAP3K2, a kinase enzyme 
that is associated* with the activation of sev- 
eral stress-induced pathways. They report that 
methylation of MAP3K2 potentiates a cellular 
signalling pathway that is involved in many 
human cancers’, the MEK-ERK mitogen- 
activated protein-kinase pathway. This path- 
way, which activates gene transcription in the 
nucleus, is composed of a cascade of enzymes 
that are sequentially activated in the presence 
of oncogenic K-Ras. Typically, K-Ras activates 
Raf; Raf activates MEK; and MEK activates 
ERK. Mazur and colleagues’ work demon- 
strates for the first time that MAP3K2 can also 
potentiate MEK-ERK signalling downstream 
of oncogenic K-Ras (Fig. 1). 

The researchers show that SMYD3-mediated 
methylation of MAP3K2 has no direct effect 
on the protein's intrinsic enzymatic activity. 
Instead, methylation promotes dissociation of 
MAP3K2 from its negative regulator, the PP2A 
phosphatase enzyme complex. Consistent 
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Tumour-cell membrane 


K-Ras-driven 
tumour progression 


Figure 1 | SMYD3 potentiates signalling in 
K-Ras-driven cancer. Mutational activation 

of KRAS causes increased signalling through 

the MEK-ERK pathway, leading to cancer. In 

this pathway, K-Ras (which is bound by a GTP 
molecule) activates the kinase enzyme Raf, 

leading to a kinase cascade in which MEK] and 
MEK2 and then ERK] and ERK2 are activated 

by phosphorylation (P). ERK1 and ERK2 then 
activate genes involved in K-Ras-driven tumour 
progression. Mazur et al. report’ that the lysine 
methyltransferase enzyme SMYD3 is involved in 
K-Ras-driven tumour progression. The enzyme 
acts by adding methyl groups (Me) to a lysine 
amino-acid residue of the kinase MAP3K2, causing 
it to dissociate from its negative regulator PP2A, 
and further activating the MEK-ERK pathway. 
However, it remains unclear precisely how SMYD3 
and MAP3K2 are linked to K-Ras (dashed arrows). 


with a role for SMYD3 in tumour progression, 
Mazur and co-workers found that silencing 
of SMYD3 lengthens the median lifespan of 
mice that have been genetically engineered 
to express oncogenic K-Ras in the lungs or 
pancreas. Interestingly, they saw this effect only 
in mice with late-stage cancer progression; 
SMYD3 silencing did not influence tumour 
initiation. Strikingly, administration ofa PP2A 
inhibitor restored tumour progression in the 
SMYD3-deficient mice. However, the authors 
did not test the effects of SMYD3 silencing in 
mice with aggressive cancers, which would 
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more accurately model the human disease. 

In sum, Mazur et al. propose that elevated 
SMYD3 expression promotes MAP3K2 
methylation, freeing the protein from the 
inhibitory constraints of PP2A and thus 
potentiating signalling downstream of onco- 
genic K-Ras. Interestingly, methylation has 
previously been shown’ to play a part in the 
regulation of Raf-MEK-ERK signalling — 
methylation of the Raf proteins B-Raf or C-Raf 
promotes their degradation, thereby inhibit- 
ing MEK-ERK signalling. The fact that protein 
methylation has now emerged as both a posi- 
tive and a negative regulator of MEK-ERK- 
pathway activation suggests that this might be 
a common strategy for the regulation of intra- 
cellular signalling pathways. 

Mazur and colleagues’ work raises the 
question of how SMYD3 links MAP3K2 and 
oncogenic K-Ras. It is unclear why SMYD3 
levels are elevated in Ras-mutated cancers, 
such as colorectal’ and pancreatic cancer’, 
although it is possible that Ras regulates 
SMYD3 expression through effects on either 
gene transcription or protein stability. Despite 
early claims’ that MAP3K2’s sister enzyme 
MAP3KI can bind directly to activated Ras 
proteins, it seems unlikely that MAP3K2 inter- 
acts directly with oncogenic K-Ras. 

Although it is well established that MEK- 
ERK signalling is involved in K-Ras-driven 
cancer, this pathway’s dependence on different 
MAP3K enzymes (B-Raf, C-Rafand MAP3K2) 
is influenced by both the stage of the cancer 
and the tissue from which it originated**"””. 
Thus, the differing effects of SMYD3 silencing 
on early- and late-stage cancer progression 
may arise because of differing levels of MEK- 
ERK-pathway activation. Perhaps the activa- 
tion of MAP3K2 is tied to a signalling circuit 
that, although not essential for the activation 
of MEK-ERK signalling in early-stage cancers, 
is required for the elevated activation of this 
pathway in more-advanced tumours". Given 
the number of growth factors and other sig- 
nalling molecules whose secretion is reported 
to be elevated in K-Ras-driven cancers!*"'°, 
oncogenic K-Ras may promote MAP3K2 acti- 
vation through a feed-forward loop driven by 
intra- or intercellular signalling proteins, such 
as members of the epidermal-growth-factor 
family. The relationship between oncogenic 
K-Ras and SMYD3 is intricate and remains 
unclear. Moreover, at least one complication 
with this model is the observation that the pro- 
tein kinase MKK7, a direct downstream target 
of MAP3K2, suppresses K-Ras-initiated lung- 
tumour progression”. 

Might patients with late-stage K-Ras-driven 
lung or pancreatic cancer benefit from drugs 
that target SMYD3 or MAP3K2? Mazur et al. 
demonstrate that SMYD3 silencing slows the 
proliferation of human K-Ras-mutated cancer 
cells. Furthermore, they found that silencing 
of SMYD3 enhanced the antitumour effects of 
trametinib, a MEK-ERK-pathway inhibitor, in 
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both cultured cells and genetically engineered 
mice. Importantly, mice lacking SMYD3 or 
MAP3K2 seem normal””, suggesting that 
inhibitors of these proteins do not have severe 
side effects, at least in mice. 

Efforts to develop drugs that inhibit SMYD- 
family lysine methyltransferases or MAP3K2 
are already under way’””’. However, inhibi- 
tion of MAP3K2 alone may not be an effec- 
tive way to dampen MEK-ERK signalling 
because other MAP3K enzymes might be able 
to perform the role of MAP3K2 in its absence. 
Combined inhibition of SMYD3 and MAP3K2 
may therefore produce more promising results. 

Despite Mazur and colleagues’ claims about 
the specificity of SMYD3 for MAP3K2, pre- 
vious work’ suggests that SMYD3 also acts 
as a histone modifier, regulating RNA poly- 
merase II, the enzyme responsible for cata- 
lysing gene transcription. Thus, inhibition of 
SMYD3 might cause unexpected side effects. 
Nevertheless, the authors’ work has expanded 
our understanding of the regulatory role of 
protein methylation in intracellular signal- 
ling. We must wait to see whether this regula- 
tory mechanism can be manipulated to treat 
patients with K-Ras-mutated cancers. = 
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Repair and replace 


One approach to treating inherited diseases is repairing the defective genes, but 
this has proved challenging in stem cells. An optimized protocol has now been 
developed that allows gene repair in blood-cell precursors. SEE ARTICLE P.235 


ALAIN FISCHER 


or more than 40 years’, gene therapy has 
been predicted to be a way to cure inher- 


ited diseases that are caused by defec- 
tive copies of a gene. The typical approach to 
gene therapy is to add a functional copy of the 
mutated gene to the genome, using a geneti- 
cally engineered retroviral vector to transport 
the copy into diseased cells (Fig. 1a). Although 
gene addition has been used to treat certain 
inherited diseases, such as severe combined 
immunodeficiencies””, its success has been 
mitigated by side effects". A different strategy, 
based on repair of the defective gene, is there- 
fore an attractive alternative. On page 235 of 
this issue, Genovese et al.° report progress in 
the optimization of such a strategy. 
Gene-repair techniques rely on artificial 
nuclease enzymes that specifically target the 
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mutated genetic sequence and createa DNA 
break. If a DNA template for the desired 
replacement sequence is provided, the mutated 
stretch can be repaired by an exchange of 
genetic information known as homologous 
recombination (Fig. 1b)°. One benefit of this 
strategy is that it takes advantage of an innate 
method of recombination used by the cell to 
repair harmful DNA breaks, and so does not 
adversely affect other genomic regions, such 
as regulatory sequences. Gene addition, on the 
other hand, relies on a semi-random integra- 
tion method that may disrupt genetic regula- 
tory sequences, because the extra sequence 
integrates imprecisely within the genome. This 
has been shown’ to cause toxic side effects, for 
example vector-mediated activation of cancer- 
causing oncogenes. 

Zinc-finger nucleases (a zinc-finger protein 
that binds to the desired sequence of DNA, 
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[ \ Cell 
\ — —-Functional _ pre-stimulation 


gene 


Semi-random 
integration 


| 


Normal gene 
product 


Cell 
pre-stimulation 
with dmPGE,, SR1 
and cytokines 
DNA 
template 


Engineered 
nuclease 


| 


Normal gene 
product 


Figure 1 | Strategies for treating inherited diseases with gene therapy. a, In gene addition, stem cells 
harbouring a genetic mutation are pre-stimulated to prepare them for genetic manipulation, and then a 
functional copy of the mutated gene is transported into the cells in a retroviral vector. The functional gene 
integrates into the genome at a semi-random site, restoring normal gene expression. b, An alternative 
approach is gene repair, whereby a nuclease enzyme engineered to create DNA breaks around the mutated 
site and a DNA template for the functional sequence are added to pre-stimulated cells using a viral vector. 
The mutated sequence is then replaced with the functional copy by homologous recombination (white 
dotted lines). Genovese et al.° have optimized pre-stimulation during gene repair in haematopoietic stem 
cells, by treating the cells both with factors that prevent early differentiation (dmPGE, and SR1) and with 
signalling molecules called cytokines that decrease the cells’ sensitivity to the toxic effects of the nuclease. 


combined with a nuclease, which cleaves DNA 
wherever the molecule binds) were the first 
enzymes to be designed for gene repair and 
have been shown to work in cell lines®. Other 
nucleases — for example, those modified 
from artificial restriction enzymes known as 
TALENSs, or from RNA-guided enzymes based 
on the bacterial CRISPR-associated system— 
have since been used’ to correct genetic muta- 
tions in human cell lines, including induced 
pluripotent stem cells (iPSCs), which have 
been genetically reset and can give rise to most 
cell types in the body. Genovese and colleagues 
have used this approach to correct mutations 
ina fraction of human haematopoietic stem 
cells (HSCs), which are a key target for treating 
inherited disorders associated with blood cells. 
How did the authors achieve this? The hurdle 
to overcome lay in the fact that homologous 
recombination occurs only in cycling cells (in 
the S/G2 phase of the cell cycle), and so the 
procedure is ineffective in adult stem cells such 
as HSCs, which are mostly quiescent. Using 
several tricks, Genovese et al. achieved effi- 
cient integration ofa green fluorescent protein 
sequence into the genome of human HSCs at 
chosen sites, including a protein-coding region 
(exon 5) of the IL2RG gene that is mutated 
in patients with X-linked severe combined 
immunodeficiency syndrome (SCID-X1). 


To prepare HSCs for gene repair, Genovese 
and colleagues stimulated the cells with signal- 
ling molecules called cytokines for two days, 
and on the second day used a viral vector to 
introduce a recombination template. They then 
used electric currents to permeabilize the cells, 
and added a Zn-finger nuclease designed to tar- 
get the desired DNA sequence. It is likely that 
pretreatment with cytokines made the HSCs 
less sensitive to the toxic effects of introducing 
the nucleases‘ and, more importantly, that the 
treatment prompted some of the cells to enter 
the cell cycle and become capable of mediating 
homologous recombination. In addition, the 
authors treated the cells with two compounds — 
dimethyl prostaglandin E, (dmPGE,) and SR1, 
an inhibitor of the aryl-hydrocarbon receptor 
protein — that prevent premature differentia- 
tion, keeping the HSCs in a stem-cell state. 

Genovese and co-workers’ protocol enabled 
site-specific genome editing in some of the 
HSCs. When the treated cells were transferred 
into immunodeficient mice, edited cells were 
detected for up to 18 weeks, and could even be 
transferred to a second recipient. When their 
technique was applied to HSCs taken from 
a patient with SCID-X1, the authors found 
that gene correction had occurred in 3-11% 
of progenitor cells and the myeloid cells that 
differentiated from them. 
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This work is undoubtedly a step towards 
using gene repair for gene therapy. Correc- 
tion of SCID-X1 is known’ to require only a 
few IL2RG-expressing HSCs, and so it may be 
that this technology can be used as it stands to 
treat patients with SCID-X1. But before that 
can be tested, it will be necessary to evaluate 
the frequency at which the protocol causes 
harmful DNA mutations, brought about by the 
cells’ attempts to repair off-target DNA breaks 
through an alternative, error-prone mode of 
recombination. It is reassuring, however, that 
Genovese and co-workers observed few such 
off-target effects when they tested their proto- 
colin a precancerous cell line. 

If this method of gene repair in HSCs is to 
be applied to diseases that demand a higher 
level of gene correction than SCID-X1 (for 
example, Wiskott-Aldrich syndrome, adreno- 
leukodystrophy, metachromatic leukodystrophy 
or B-thalassaemia’”’), further optimization is 
required. For example, the design of nucleases 
that increase the fidelity of DNA cuts, com- 
bined with the in vitro expansion of treated 
HSCs before transfusion might increase the 
efficiency of the protocol. It is interesting to note 
that an improved gene-addition protocol, which 
uses self-inactivating viral vectors designed to 
reduce activation of host genes, including onco- 
genes, has led to the successful and apparently 
safe treatment of the inherited disorders men- 
tioned above’ “°. Competition between the two 
strategies is likely to continue for some time. 

Ultimately, gene repair seems to be an attrac- 
tive strategy for targeting stem cells, such 
as HSCs, that have been taken from tissues 
affected by inherited disorders. As technology 
that allows reprogramming of differentiated 
cells to stem-cell states improves, gene correc- 
tion in reprogrammed cells such as iPCSs or 
induced HSCs" engineered from the patient's 
blood cells will probably become a preferable 
option. Until this goal is reached, however, it is 
good to see that Genovese and colleagues have 
made another advance in extending the useful- 
ness of gene therapy. m 
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Search for Majorana neutrinos with the 
first two years of EXO-200 data 


The EXO-200 Collaboration* 


Many extensions of the standard model of particle physics suggest that neutrinos should be Majorana-type fermions—that 
is, that neutrinos are their own anti-particles—but this assumption is difficult to confirm. Observation of neutrinoless 
double-f decay (0vff), a spontaneous transition that may occur in several candidate nuclei, would verify the Majorana 
nature of the neutrino and constrain the absolute scale of the neutrino mass spectrum. Recent searches carried out with 
”°Ge (the GERDA experiment) and '*°Xe (the KamLAND-Zen and EXO (Enriched Xenon Observatory) -200 experiments) 
have established the lifetime of this decay to be longer than 10” years, corresponding to a limit on the neutrino mass of 
0.2-0.4 electronvolts. Here we report new results from EXO-200 based ona large *°Xe exposure that represents an almost 
fourfold increase from our earlier published data sets. We have improved the detector resolution and revised the data 
analysis. The half-life sensitivity we obtain is 1.9 x 107° years, an improvement by a factor of 2.7 on previous EXO-200 
results. We find no statistically significant evidence for Ovff decay and set a half-life limit of 1.1 x 10” years at the 90 per 
cent confidence level. The high sensitivity holds promise for further running of the EXO-200 detector and future Ovff 


decay searches with an improved Xe-based experiment, nEXO. 


Majorana fermions, a class of neutral spin-1/2 particles described by 
two-component spinors, have been an element of quantum field theory 
since its inception’”. Electrons and other spin-1/2 elementary particles 
with distinct antiparticles are, however, described by four-component 
Dirac spinors. Majorana quasiparticles may have been observed in con- 
densed matter systems’ where neutrality is achieved through the collec- 
tive action of electrons and holes. Among the known elementary particles, 
only neutrinos are Majorana fermion candidates, owing to their in- 
trinsic neutrality. Confirmation of this property would imply the non- 
conservation of lepton number, an additive quantum number that, unlike 
charge or colour, is not related to any known gauge symmetry. As yet, 
lepton number has been empirically found to be conserved. Neutrinos 
are also remarkable for their small, yet finite, masses* that are generally 
difficult to explain, but arise naturally in many extensions”® of the 
standard model of particle physics. A generic consequence of many 
such extensions is that neutrinos should be of the Majorana variety. 
The most sensitive probe for Majorana neutrinos is a nuclear pro- 
cess known as neutrinoless double-f decay (0vff), whereby a nucleus 
decays by emitting two electrons and nothing else, while changing its 
charge by two units’. A related double-B decay process, known as two- 
neutrino double-B decay (2vff), is allowed by the standard model and 
has been observed in many nuclei, 1366 among them®”. It provides, 
however, no direct information on the Majorana/Dirac question. The 
exotic Ov can be distinguished from the 2vff by measuring the sum 
energy of the two electrons that is peaked at the Q-value for the former 
and is a continuum for the latter (the Q-value is the mass difference be- 
tween the mother and daughter nuclei). We refer to this region around 
the Q-value as the Ov region of interest (ROI). The half-life of the OvB8 
is related to the effective Majorana neutrino mass ((mgz)) by a phase space 
factor and a nuclear matrix element. Hence observation of the 0vBf} decay 
would discover elementary Majorana particles, demonstrate lepton num- 
ber violation and measure the neutrino mass scale (mp), at least to 
within the theoretical uncertainty of the nuclear matrix elements’®. 
Recent sensitive searches for OvBB have been carried out in ’°Ge 
(GERDA") and '°°Xe (KamLAND-Zen” and EXO-200'%). These 


experiments have set limits on the Majorana neutrino mass of ~0.2- 
0.4 eV, and have cast doubt on an earlier claim of observation". In this 
Letter we report on new 0v/f search results from the EXO-200 experi- 
ment based on about two years of data. 


The EXO-200 detector 


EXO-200 has been described in detail elsewhere’’. Briefly, the detector 
is a cylindrical liquid xenon (LXe) time projection chamber (TPC), 
roughly 40 cm in diameter and 44 cm in length. Two drift regions are 
separated in the centre by a cathode. The LXe is enriched to 80.6% in 
130¥6 the OvBP candidate (Q = 2,457.83 + 0.37 keV; ref. 16). The 
TPC provides X-Y-Z coordinate and energy measurements of ion- 
ization deposits in the LXe by simultaneously collecting the scintil- 
lation light and the charge. Charge deposits spatially separated by about 
1cm or more are individually observed and the position accuracy for 
isolated deposits is a few millimetres. Avalanche photodiodes (APDs) 
measure the scintillation light. Small radioactive sources can be posi- 
tioned at standard positions near the TPC to calibrate the detector and 
monitor its stability. 

The TPC is shielded from environmental radioactivity on all sides by 
~50 cm of HFE-7000 cryofluid’” (HFE) maintained at ~ 167 K inside a 
vacuum-insulated copper cryostat. Further shielding is provided by at 
least 25 cm of lead in all directions. The entire assembly is housed in a 
clean-room located underground at a depth of 1,585 §! metres water 
equivalent’® (a measure of the effective shielding accounting for variations 
in the overhead rock) at the Waste Isolation Pilot Plant near Carlsbad 
(New Mexico). Four of the six sides of the clean-room are instrumented 
with plastic scintillator panels (‘muon-veto panels’) recording the passage 
of cosmic ray muons. An extensive materials screening campaign’” was 
employed to minimize the radioactive background produced by the detec- 
tor components. 


Data analysis and methodology 


The data analysis methods in this work follow closely those presented 
in detail elsewhere’. Events in the detector are classified as single-site 


*A list of authors and affiliations appears at the end of the paper. 
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(SS) or multi-site (MS) according to the number of detected charge de- 
posits. OvBf events are predominantly SS whereas y backgrounds are 
mostly MS. For each event, the energy is determined as a linear com- 
bination of charge and scintillation, while a ‘standoff distance’ is de- 
fined as the distance between a charge deposit and the closest material 
that is not LXe, other than the cathode. To search for Ovf, a binned 
maximum-likelihood fit is performed simultaneously over the SS and 
MS events using probability density functions (PDFs) in energy and 
standoff distance, generated using a Geant4-based”” Monte Carlo si- 
mulation (MC). The energy range 980-9,800 keV is used. The ‘low- 
background data set’ (physics data) is obtained after applying event 
selection cuts. With respect to ref. 9, the current analysis additionally 
includes: (1) improved signal processing for the scintillation waveforms 
resulting in lower noise; (2) *6Ra source calibration data; (3) an ex- 
panded fiducial volume; (4) the estimation of systematic errors related 
to the 0vff ROI; and (5) updated background and systematic studies 
relevant to the Ovff search. 

The data set presented here (Run 2) combines Run 2a (already used 
for refs 9 and 13, 22 September 2011 to 15 April 2012) and Runs 2b and 
2c (16 April 2012 to 1 September 2013). After removing periods of poor 
data quality and calibration runs, the total amount of low-background 
data for this analysis is 477.60 + 0.01 days, a 3.8-fold increase from pre- 
vious EXO-200 publications. The primary tool used for understanding 
and correcting the detector energy measurement is the 2,615-keV y line 
of *°°T] from a ”*Th source deployed at least twice weekly during the 
time spanned by this data set. Seven multiday calibration campaigns 
involving the use of multiple sources (?8Th, Co, ?7Ra and 137Cs) were 
performed at roughly three-month intervals throughout the data set. 
The lifetime of ionization electrons in the LXe is better than 2 ms for 
the entire data set, more than sufficient to collect charge across the full 
volume of the detector. We determine the optimal linear combination 
of scintillation and ionization signals once per week by minimizing the 
width of the 2,615-keV line. To prevent making analysis decisions that 
could bias the results in the ROI, the low-background data were par- 
tially ‘masked’ to hide ~2/3 of the live-time for SS events between 2,325 
and 2,550 keV. Live-time already analysed in previous publications (for 
example, Run 2a) was not masked. 

The energy resolution of the detector is dominated by electronic noise 
in the scintillation readout and exhibits variations over time due to changes 
in this noise. We apply a de-noising algorithm to the scintillation signals 
during post-processing, improving the detector resolution and reducing 
its time dependence. This algorithm attempts to find the optimal combi- 
nation of APD waveforms to determine the amount of scintillation light 
for each event, taking into account the measured electronic noise of each 
APD channel as well as the position of each charge deposition in the 
detector. Figure 1 shows the resolution with and without de-noising. 

We define an effective, time-independent energy resolution function? 
o?(E)= Friec +0°E+C7E*. Here Cele, b and care 20.8 keV, 0.628 keV"? 
and 1.10 X 10° (25.8 keV, 0.602 keV!” and 4.04 X 10°) for SS (MS), 
determined by a maximum-likelihood fit to calibration data taken dur- 
ing Run 2. This function is folded with the energy distributions derived 
from the simulation to create the PDFs used in final fits. The effective 
resolution (o/E) for SS (MS) at the Ovff Q-value is 1.53 + 0.06% 
(1.65 + 0.05%). 

The fiducial volume is larger than in ref. 9 to maximize the sensitive 
mass while maintaining systematic uncertainties at an acceptable level. 
Events in the fiducial volume are required to have 182mm > |Z| > 
10 mm (where Z = 0 is the cathode plane) and are contained in a hexa- 
gon with 162 mm apothem. This represents a '*°Xe mass of 76.5kg, 
corresponding to 3.39 X 107° atoms of '*°Xe and, with the quoted live- 
time, results in an exposure of 100 kg yr (736 mol yr). 


Investigation and determination of systematic errors 


The main systematic uncertainties relevant to the search for Ovff are 
related to signal efficiency, location of the 0vff ROI within the spec- 
trum, and estimation of the background in the ROI. 
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Figure 1 | Effect of de-noising on the energy resolution, o/E. Shown is the 
resolution for SS events at the 2,615-keV *°*T] full-absorption peak (with and 
without de-noising) and propagated to the 0vff Q-value (with de-noising). 
The variation with time (shown on x axis) is caused by changes in the noise of 
the APD front-end electronics. The horizontal dashed line shows the effective 
Q-value SS energy resolution used for the data set (1.53%). MS resolution 
(not shown) exhibits similar behaviour. Error bars, +1 s.d. 


To verify the simulation’s ability to model efficiencies and the 
background, we compare measurement and simulation of calibration 
sources deployed at various positions around the TPC, investigating 
in particular: (1) the energy and standoff distance distributions, (2) the 
integrated rate of selected events, and (3) the SS/MS event ratio versus 
energy. A representative set of results for (1) is shown in Fig. 2, where 
simulation—data agreement for the **°Ra source are presented. **°Ra is 
a particularly valuable source because of several y lines that map a broad 
energy region including the 0vff ROI. The energy spectrum shows 
good agreement across the energy range of the analysis. Comparable 
results were also obtained with the °°Co and ***Th sources. The stand- 
off distance agreement is within statistical errors except in the first 
10 mm bin, where the simulation produces more events in the fiducial 
volume than seen in data. 
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Figure 2 | Comparison of energy and standoff distance distributions of a 


??°Ra calibration source for SS events in simulation and data. Energy (main 


panel) and standoff distance (inset), both in normalized counts, are shown for 
data (black points) and simulation (blue line). The calibration source is at a 
position near the cathode outside the TPC. Error bars, +1 s.d. 
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Discrepancies in the shapes of energy and standoff distance distri- 
butions between data and simulation affect the estimation of the back- 
ground in the 0v£f ROI. To quantify this effect, we calculate skewing 
functions based on the small discrepancies observed in source calibra- 
tion studies. We distort the background PDFs with the skewing func- 
tions and use these to produce a set of ‘toy’ MC data sets, which are 
then fitted to un-skewed PDFs. The change in the 0vff ROI back- 
ground is 9.2%, which we take as systematic error. 

In the rate comparison studies (2), we combine the total number of 
selected events in data and simulation as (data — MC)/data for several 
source positions. The error-weighted average of the results is calcu- 
lated using the fiducial volume in this analysis as well as that in ref. 9. 
The difference between these values is 1.7%, which we combine with the 
underlying fiducial volume uncertainty (also 1.7%; ref. 9) conserva- 
tively assuming full correlation to produce a total error on the detector 
efficiency of 3.4%. 

To address (3), the ratio of the number of SS events to the total 
number of events, SS/(SS + MS), is compared between data and simu- 
lation for three sources in Fig. 3. The general behaviour is largely inde- 
pendent of the underlying spectral shape. We choose to assign a single 
systematic uncertainty to the SS/(SS + MS) ratio of 9.6%, calculated 
from the weighted average of the maximum deviations observed for 
the °*”°Th, ©°Co and *”°Ra (data from the latter available after June 2013) 
sources at several different source locations in each calibration campaign. 

Event selection requires an event to be fully reconstructed in all 
three coordinates (X, Y and Z). We compare the relative efficiency of 
this requirement for 2v$f from MC to the measured relative effi- 
ciency derived from the background-subtracted low-background ener- 
gy spectrum. Here, we define the relative efficiency as the ratio of the 
number of events passing the entire set of selection requirements to the 
number passing the set not including the full-reconstruction require- 
ment. The relative efficiency from simulation changes modestly across 
the 2vBf energy range (>99% to 90% from 980 keV to 2,450 keV) and 
similar behaviour is seen in data. The average deviation between simu- 
lation and data over the 2vff spectrum (7.8%) is taken as a systematic 
error on the efficiency. 
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Figure 3 | Event multiplicity in data and simulation. a, Plot of SS/(SS + MS) 
ratio versus energy in data for 26Ra, Co and >“ Th calibration sources. 

b, Comparison of SS/(SS + MS) ratio between data and simulation for the three 
sources, as a function of energy. Despite having different underlying energy 
spectra, all sources exhibit similar behaviour across the shown energy range 
when comparing data and simulation (b). Error bars, +1 s.d. 
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Table 1 | Ov signal efficiency and associated systematic errors 


Source Signal efficiency Error 
(%) (%) 
Summary from ref. 9 93.1 0.9 
Partial reconstruction 90.9 78 
Fiducial volume/rate agreement NA 3.4 
Total 84.6 8.6 


‘Partial reconstruction’ refers to the requirement that all events be fully reconstructed in X, Y and Z. The 
summary for event selection from ref. 9 includes all efficiencies and related errors except fiducial volume 
and partial reconstruction, which have been recalculated in this work for Ovf. NA, not applicable. 


The uncertainty on the location of the ROI in the spectrum is 
dominated by a possible energy-scale difference between /-like events 
in the LXe (for example, 0vff) and y-like events (including most 
backgrounds and the sources used for the primary energy calibration). 
We define the “f-scale’ as Ez; = BE,, where Ep (E,) is the energy for 
depositions from fs (ys) and B is a measured constant. We determine 
the f-scale by fitting to the 2v$f-decay-dominated low-background 
data and find B = 0.999 + 0.002. 

Several cross-checks were performed to search for energy depend- 
ence in the f-scale. The above fits were performed using different ener- 
gy thresholds and with different background PDFs produced using the 
skewing functions discussed earlier. We also fitted the low-background 
data assuming a linear energy dependence (for example, pp + p,E,) for 
B. In all cases the results are consistent with the original fit, providing 
no evidence for energy dependence of the /-scale. The estimate of the 
-scale is also robust against a different choice of 2vf spectral shape”’. 

To investigate the dependence of the ROI background estimate on 
the completeness of the model used to fit the data, we derive PDFs 
from different source locations and introduce them separately into the 
default background model used in the fit. The relative change of the 
estimated ROI background is then determined. The three background 
PDFs considered in this study are ***U in the HFE and inner cryostat, 
and °°Co in the copper source guide tube. These were chosen because 
the initial source location affects relative amplitudes and spectral fea- 
tures in the ROI, that is, the 7™“Bi y (2,448 keV) and ©°Co sum peak. 
This study indicates a total possible deviation of 5.7% for the expected 
background counts in the ROI. 

The residual time dependence of the energy resolution (Fig. 1) can 
introduce additional counts in the ROI from the 2,615-keV 7° TI peak. 
This was estimated to affect the ROI background counts by +1.5%. 

A summary of the 0v{f signal efficiency and associated uncertainty 
is presented in Table 1. Table 2 summarizes the uncertainties on the 
estimation of background in the ROI. These errors are explicitly in- 
cluded as input to the final fit to the low-background data. Items not 
listed in the tables, such as the f-scale and the SS/MS ratios, still con- 
tribute to the total systematic error on the Ovff signal as they are 
propagated to the final result by the maximum-likelihood fit to the 
low-background data. 

Neutrons arising from cosmic-ray muons or radioactive decays in 
the salt surrounding the laboratory may contribute background to the 
OvfB ROI via neutron capture or spallation processes. The contri- 
bution in the ROI is expected to arise primarily from neutron-capture 
ys in the LXe and surrounding materials (for example, capture on 
Cu and °°Cu in the copper components, and on '*°Xe in the LXe). 
A simulation using a simplified experimental geometry and employing 
the FLUKA”*” and SOURCES” software packages is used to generate, 


Table 2 | Systematic errors on background determination in the ROI 


Source Error (%) 
Background shape distortion 9.2 
Background model Bud 
Energy resolution variation L5 
Total 10.9 


These errors arise from incorrect modelling of the background shape (‘Background shape distortion’), 
incorrect or incomplete background model (‘Background model’) and the residual variation of the 
energy resolution over time (‘Energy resolution variation’; see, for example, Fig. 1). 
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track and thermalize neutrons. The resulting neutron capture rates are 
used as input to the Geant4-based”° EXO-200 simulation package’, with 
the respective n-capture y-spectra produced on the basis of ENSDF 
information” for the given nuclides. The produced PDFs are used 
in fits to the low-background data. Good shape agreement is found 
between these PDFs and data coincident with muon-veto-panel events. 


Results 

The fit to the low-background data minimizes the negative log-likelihood 
function constructed using a signal and background model composed 
of PDFs from simulation. A profile-likelihood scan is performed to search 
for a Ov/P signal. 

The PDFs chosen for the low-background fit model are those used in 
ref. 9 plus a ‘far-source’ ?’Th PDF, a '*’Xe PDF and neutron-capture- 
related PDFs, including 136X¥ 6 neutron capture in the LXe, 'H neutron- 
capture in the HFE, and ®’Cu,°°Cu neutron capture in Cu components 
(LXe vessel, inner and outer cryostats). The far-source ?32Th PDE allows 
for background contributions from Th in materials far from the TPC, 
for example in the HFE and in the copper cryostat. (Remote ***U is 
included in the fit model via 7?*Rn, simulated in the air between the 


cryostat and Pb shield.) We combine the neutron-capture-related PDFs 
to form one PDF, allowing the relative rates of the component PDFs to 
float within 20% of their simulation-estimated values. The total rate of 
this summed PDF is allowed to float unconstrained. 

We constrain the single-site fractions, SS/(SS + MS), of all compo- 
nents to be within 9.6% of their value calculated from simulation. An 
additional 90% correlation between single-site fractions of y compo- 
nents is introduced into the likelihood function, owing to the consist- 
ent behaviour observed in these parameters in calibration studies (for 
example, Fig. 3). The overall normalization is allowed to float within 
the estimated systematic errors (8.6%). The background-PDF ampli- 
tudes within the ROI are also allowed to vary within their estimated 
systematic errors (10.9%). The f-scale is not allowed to float during the 
fit, but is manually profiled while performing the profile-likelihood 
scan for Ov. 

The final step before performing the fit was the unmasking of live- 
time around the SS ROI. However, before unmasking the full data set, 
we investigated backgrounds associated with Xe feeds, irregular occur- 
rences in which additional Xe gas is introduced into the purification 
circulation loop. (These Xe feeds occurred 10 times over the run period 


a E ‘ ‘ ‘ - " 
E Ss 7 j 
4 5 3 
E 24 q 
L = q 
t E 2 4 
& of rs 1 ] 
Q Eee le Ff me 3 
2 ie aS ee ; 2,250 2,300 2,350 2,400 2,450 2,500 2,550 2,6004 
=} E Energy (keV) | 
& ‘Bi +3 
) im q 
= L a 
2 oF 4 
- nny kl a 
43 @ 
iy 
NS cteP diet Ne patonntgom oe 9 e 
i 
e 73 St 
b = : : : : : : =—6 
E ms + Data ==» 232Th (far) 50 T 1 1 1 + F| 
if — Best fit Vessel (4°K, ©°Co, 4oL | 
4 Be 060t~—~—~—COC Rn = 8652p, 232Th, 238) q 
S E += 135X@, 137X%@ OvpB £ 30 4 4 
— 2y| 3 
Q 3 E n-capture vBB So 20 fi \ itt fr ry 4 3 
+t E q 
~ OR ob H-_ 7 nae? Fi “th 4 
rom 25 F eee etd en “| 4 
2 E 2,100 2,200 2,300 2,400 2,500 2,600 2,700 5 
3 leer Energy (keV) Z 
cc by i j 
Nye ae 
8 q Titi) | en Le ' i 7 
' =f a a | E| 
see eee, iu rele q 
+ : : : —— = + = : = 6 
° 43 9 
Q. 
PL EDLON SN Regt root aan pina nisgne Som Fm a Se 
S 
4-3 2 
= 
1,000 1,500 2,000 2,500 3,000 3,500 4,000 7 
Energy (keV) 


Figure 4 | Fit results projected in energy. a, b, Main panels show SS (a) and 
MS (b) events, as counts versus energy, with a zoom-in (inset) around the ROI: 
2250-2600 keV (2100-2700 keV) for SS (MS); the bin size is 14 keV, and data 
points are shown in black. Lower panels in a and b show residuals between data 
and best fit normalized to the Poisson error, ignoring bins with 0 events. The 
green (blue) shaded regions in the lower panels represent +10 (+20) 
deviations. The 7 (18) events between 4,000 and 9,800 keV in the SS (MS) 
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spectrum have been collected into an overflow bin for presentation here. The 
vertical (red) lines in the SS spectra indicate the +20 ROI. The result of the 
simultaneous fit to the standoff distance is not shown here. Several background 
model components (including Rn, 135X¢ and °7Xe, n-capture, 232TH (far), 
Vessel, OvBf and 2v/3f, all described further in the text) are indicated in the 
main panel of b to show their relative contributions to the spectra. Error bars on 
data points, +1s.d. 
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and are known to temporarily elevate, for example, Rn levels in the 
detector.) The live-time in the two-week periods following the 10 feed 
events were unmasked first to search for increased background levels in 
the ROL No evidence for such an increase was found and the unmask- 
ing of the remaining live-time proceeded. 

The results of the maximum-likelihood fit are presented in Fig. 4. The 
measured 2vff decay rate is consistent with ref. 9. From the best-fit 
model, the estimate of the background in the 0vff + 2¢ ROI is 31.1 + 
1.8(stat.) + 3.3(sys.) counts, or (1.7 + 0.2) x 10 *kev! kg! yr | nor- 
malized to the total Xe exposure (123.7 kgyr). Both this and the lo 
value (also (1.7 + 0.2) X 10 *keV kg! yr) are consistent with pre- 
vious results, 1.5 + 0.1 (1.4 + 0.1) with the same units in the + 1a (+20) 
ROI. The dominant backgrounds arise from **’Th (16.0 counts), **°U 
(8.1 counts) and '*’Xe (7.0 counts). This amount of '*”Xe is consistent 
with estimates from studies of the activation of '*°Xe in muon-veto- 
tagged data. The total number of events seen in this region is 39. The 
best-fit value of 0vff counts is 9.9, consistent with the null hypothesis at 
1.20 as calculated using toy MC studies. The corresponding profile- 
likelihood scan of this parameter is shown in Fig. 5. 

A number of cross-checks were performed on the result. No event 
reconstruction anomalies were found after hand-scanning all events 
in the ROI. The time-between-events distribution of the ROI events is 
consistent with a constant-rate process and the standoff distance 
distribution of events in data is consistent with the best-fit model. 
Additional backgrounds were considered that could contribute events 
to the ROI. In particular, we tested for ''°” Ag and **Y because of their 
possible association with the measurement in ref. 12, and found that 
both produce a distinct high-multiplicity signature in EXO-200 (SS/ 
(SS + MS) ~ 5-10%). Separate fits including each of these PDFs con- 
tributed the following counts to the +2o ROL: Nitoma, = 9-04 + 0.02 
and N,s, = 0.02 + 0.01. Finally, we were able to exclude any significant 
effect on the ROI background from *"“Bi external to the Pb shield— 
for example, from ***U in the surrounding salt. 


Discussion 


In summary, we report a 90% confidence level lower limit on the OvB8 
half-life of 1.1 X 107° yr. With the nuclear matrix elements of refs 
26-29 and the phase space factor from ref. 21, this corresponds to an 
upper limit on the Majorana neutrino mass of 190-450 meV. Using 
the three flavour fit of ref. 30 (also M. Tortola and J. Valle, personal 
communication), we further use this range of effective mass limits to 
construct a constraint on the mass min of the lightest neutrino mass 
eigenstate, assuming the most disadvantageous combination of CP 
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Figure 5 | Profile likelihood, 4, for OvBB counts. The horizontal dashed lines 
represent the 1g and 90% confidence levels assuming the validity of Wilks’ 
theorem**”, intersecting the profile curve at (3.1, 18) and 24 Ovff counts, 
respectively. From toy Monte Carlo studies, the best-fit value is consistent with 
the null hypothesis at 1.20. 
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phases. This corresponds to pin < 0.69-1.63 eV, in the case where neu- 
trinos are Majorana particles. 

The results reported here supersede those of ref. 13, owing to the 
increased exposure and improved analysis. The limit presented is how- 
ever not as strong as the limit from ref. 13, consistent with expected 
statistical fluctuations in the data. An appropriate metric to character- 
ize the improvement of the experiment independent of such fluctua- 
tions is the ‘sensitivity’, defined as the median expected 90% confidence 
level half-life limit assuming the background estimated from the max- 
imum-likelihood fit and the absence of a 0v/f signal. We calculate this 
metric using an ensemble of limits determined from Monte Carlo pseudo- 
experiments and find the EXO-200 sensitivity to be 1.9 X 10° yr, repre- 
senting an improvement by a factor of 2.7 over ref. 13. 

In Fig. 6 we compare the Ov sensitivity and half-life limits from 
the GERDA, KamLAND-Zen and EXO-200 experiments. Also shown 
is the positive observation claim in ’°Ge from ref. 14. The results of the 
present analysis are inconsistent with the central value of this claim at 
90% confidence level for two of the four considered nuclear matrix 
element calculations, namely, GCM” and NSM”’. 

The first two years of EXO-200 data demonstrate the power of a 
large and homogeneous LXe TPC in the search for 0vff. Simulations 
of the nEXO experiment, a proposed 5,000-kg LXe TPC based on the 
EXO-200 design, show that the state-of-the-art background measured 
in EXO-200 can be further improved by finer charge readout pitch (to 
improve the SS/MS discrimination) and by lower electronic noise in 
the scintillation channel. In addition, Xe self-shielding will become 
more powerful in larger detectors, where the y attenuation length at 
energies near the Q-value becomes small with respect to the linear size 
of the LXe vessel. This advantage only applies to monolithic, homo- 
geneous detectors. 
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Figure 6 | Comparison with recent results from '*°Xe and °Ge Ovpp 
experiments. Sensitivity (orthogonal lines) and limits (arrows) from GERDA 
and KamLAND-Zen are from refs 11 and 12, respectively. The diagonal lines 
are derived from several recent nuclear matrix element calculations and the 
phase-space factor from ref. 21, included to allow comparison between results 
from the two nuclei: GCM”°, NSM2’, IBM-2°° and RQRPA”. Tick marks along 
these lines indicate the associated effective neutrino mass in eV. The claimed 
observation in “Ge (KK&kK; ref. 14) is shown as a shaded grey band (CL, 
confidence level). The previous EXO-200 limit and sensitivity from ref. 13 were 
1.6 X 10*° yr and 0.7 X 10” yr, respectively. 
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Targeted genome editing in human 
repopulating haematopoietic stem cells 
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Roberta Mazzieri'+, Chiara Bonini®, Michael C. Holmes‘, Philip D. Gregory*, Mirjam van der Burg, Bernhard Gentner??, 


Eugenio Montini’, Angelo Lombardo'** & Luigi Naldini!?* 


Targeted genome editing by artificial nucleases has brought the goal of site-specific transgene integration and gene correc- 
tion within the reach of gene therapy. However, its application to long-term repopulating haematopoietic stem cells (HSCs) 
has remained elusive. Here we show that poor permissiveness to gene transfer and limited proficiency of the homology- 
directed DNA repair pathway constrain gene targeting in human HSCs. By tailoring delivery platforms and culture 
conditions we overcame these barriers and provide stringent evidence of targeted integration in human HSCs by 
long-term multilineage repopulation of transplanted mice. We demonstrate the therapeutic potential of our strategy by 
targeting a corrective complementary DNA into the IL2RG gene of HSCs from healthy donors and a subject with X-linked 
severe combined immunodeficiency (SCID-X1). Gene-edited HSCs sustained normal haematopoiesis and gave rise to 
functional lymphoid cells that possess a selective growth advantage over those carrying disruptive IL2RG mutations. 
These results open up new avenues for treating SCID-X1 and other diseases. 


Haematopoietic stem cell (HSC)-based gene therapy has provided ther- 
apeutic benefit in primary immunodeficiencies'’, thalassaemia’ and 
leukodystrophies*” . Whereas more advanced vectors, such as lentiviral 
vectors®, have shown improved safety and efficacy, the risk of insertional 
mutagenesis’* and unregulated transgene expression®”® remains a con- 
cern when using semi-randomly integrating vectors. These adverse effects 
may trigger oncogenesis, toxicity or elimination of the gene-modified cells. 

Artificial endonucleases, such as zinc finger nucleases (ZFNs) 1 ‘tran- 
scription activator-like effector nucleases (TALENs)””, and RNA-guided 
nucleases (CRISPR/Cas)’’, brought the possibility of gene targeting within 
the reach of gene therapy’*"*. These nucleases are used to efficiently and 
specifically target a DNA double-strand break (DSB) to a pre-selected 
genomic site'*'*'”. According to the repair process that seals the break"®, 
the outcome can be disruption, reconstitution or editing of the original 
sequence. If the DSB is sealed by the error-prone non-homologous end- 
joining (NHE)) pathway, insertions and deletions (indels) are common”. 
Ifthe DSB is sealed by the high-fidelity homology directed repair (HDR) 
pathway, which acts preferentially during the S/G2 phase, the targeted 
sequence can be edited by providing an exogenous DNA template flanked 
by homologous sequences to the nuclease target site. Targeted editing 
allows for the integration of an expression cassette into a safe genomic 
harbour”, or correcting disease-causing mutations by inserting a func- 
tional copy of the affected gene downstream of its own promoter’*. 
Gene correction, as opposed to gene replacement, may not only restore 
the function but also the physiological expression of the gene, a long- 
sought goal of gene therapy. 

Whereas gene disruption by ZFNs has been shown in human hae- 
matopoietic stem/progenitor cells (HSPCs) assayed by repopulation of 
SCID mice”, targeted gene editing in these cells has not been reported. 
Here we identify and overcome major constraints limiting gene targeting 
in HSPCs and provide proof-of-efficacy for this approach by functional 
reconstitution of the IL2RG gene, mutations of which are responsible 
for SCID-X1. 


Site-specific integration in human HSPCs 


We developed a protocol for delivery of ZFNs and donor DNA template 
into human umbilical cord blood CD34" cells by mRNA electroporation 
and integrase-defective lentiviral vector (IDLV), and targeted integration 
ofa GFP cassette into the AA VS1 ‘safe harbour” or a mutational hotspot 
of IL2RG (refs 14, 15) (Fig. 1a, b and Extended Data Fig. 1). This protocol 
yielded on average 5% GFP cells in liquid culture and colony-forming 
cell (CFC) assays and high percentages of indels in the respective ZFN 
target sites, although after a transient cell loss (Fig. 1c and Extended 
Data Fig. 2a—e). PCR analysis and Southern blot showed integration of 
the GFP cassette at the intended targets in >90% of the GFP” colonies 
(n = 89) and in induced pluripotent stem cells obtained by reprogram- 
ming the GFP™ sorted cells (Fig. 1d, eand Extended Data Fig. 2f, g). We 
then transplanted the CD34" cells treated for AAVS1 or IL2RG gene 
targeting into NOD-SCID-Il2rg ’~ (NSG) mice and found human cell 
engraftment in all mice (Fig. 2a). In the first 8 weeks after transplant, 
95% of the mice had circulating GFP* cells (mean 6.2 + 1.3%; Fig. 2b), 
whereas only 42% of the mice maintained long-term GFP marking. 
End-point analyses performed on the peripheral blood, spleen and 
bone marrow showed that GFP* cells were present within all human 
haematopoietic lineages, including lymphoid and myeloid cells, and 
erythroid precursors (mean 2 + 0.8%; Fig. 2c). Similar frequencies of 
GFP" cells were found among primitive and committed progenitors in 
the bone marrow (mean 2.2 + 0.9%; Fig. 2d). PCR analyses confirmed 
targeted integration in human lymphoid, myeloid and CD34" cells from 
the spleen and bone marrow of representative mice (Fig. 2e). CFC assays 
on CD34" cells from bone marrow showed GFP myeloid and eryth- 
roid colonies (Fig. 2f) with targeted integration (Fig. 2e). Analysis of 
bone marrow cells showed the occurrence of NHEJ-mediated indels 
in the ZFN target site in the majority of mice (64%; Fig. 2g) at higher 
levels than observed for GFP marking, indicating that DNA DSB induc- 
tion and repair by either HDR or NHEJ is compatible with haemato- 
poietic repopulation. Overall, these data show that our gene targeting 
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Figure 1 | Targeted integration into AAVS1 or IL2RG in umbilical cord 
blood CD34" cells. a, Schematic of the donor IDLV template containing a 
GFP cassette driven by the phosphoglycerate kinase promoter (PGK) flanked by 
sequences homologous to the genomic target locus, the target locus with the 
ZEN cleavage site and the locus after HDR showing the PCR primers used to 
assess 5’ or 3’ HDR-mediated integration junctions (black arrows). b, Flow 
chart for targeted integration and cell analyses. c, Representative flow cytometry 
dot plots (top) and percentages of GEP* cells and NHEJ-induced indels at 
the target locus (bottom) of cord blood CD34* cells treated for targeted 
integration into AAVS1 or IL2RG. Mean + s.e.m. (AAVS1, n = 39 on 19 cord 
blood donors; IL2RG, n = 10 on 9 cord blood donors). Unrelated donor 
indicates cells treated with IDLV lacking homology to the target site. ND, not 
detectable; NP, not performed. d, Targeting specificity in CFCs. Percentage 
of colonies positive for both (HDR), either (HDR+NHE)) or none (Unknown) 5’ 
and 3’ HDR junctions by PCR. Numbers of colonies screened are indicated inside 
the bars. e, Southern blot (top) and PCR (bottom) analyses of iPSCs obtained by 
reprogramming GFP* or GEP~ cells from panel c. TI, targeted integration. 


protocol achieves site-specific integration in human multipotent long- 
term NSG repopulating cells (SRCs), surrogate readouts of HSCs. 


Low targeting efficiency in HSPCs 


The in vivo studies revealed that only ~40% of mice had on average 2% 
human GFP* cells over the long term. These figures seem lower than ex- 
pected from transplanting CD34" cells with ~5% targeting efficiency 
in vitro (Supplementary Information) and suggest that either SRCs are 
targeted less efficiently than the bulk CD34* cells, or the gene-targeted 
SRCs have a competitive disadvantage in vivo. We thus compared the 
percentages of GFP™ cells among different subpopulations of cultured 
CD34" cells, prospectively identified by surface markers” as primitive 
(CD34*CD133* CD90"), early (CD34*CD133" CD90") and committed 
(CD34*CD133_ ) progenitors, and the differentiated cells (CD34, Fig. 3a, 
left panel). We found a decreasing frequency of GFP * cells when moving 
from the differentiated cells up the progenitor hierarchy. In primitive cells 
the percentage of GFP* cells was 20-fold lower than that measured in 
differentiated cells. We thus investigated the potential rate-limiting steps 
for gene targeting in primitive cells (Extended Data Fig. 3a—d). Transgene 
expression on mRNA electroporation was similar among the subpopu- 
lations or slightly lower for the primitive cells. The level of NHEJ induced 
at the ZFN target site was higher in the primitive cells and progressively 
lower in committed and differentiated cells. This difference, however, 
diminished with time in culture, potentially due to the loss of some treated 
primitive cells. Indeed, induction of apoptosis was higher in this sub- 
set. Taken together these data indicate that the primitive cells are more 
sensitive to our treatment and less permissive to HDR and/or donor 
template delivery. 


Tailored conditions improve HSC gene targeting 
Because cell-cycle progression is a requirement of HDR and activation of 
the primitive progenitors may require longer stimulation, we postponed 
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Figure 2 | Transplantation of gene-targeted CD34* cells in NSG mice. 
CD34" cells treated as in Fig. 1b were transplanted into NSG mice. a, Left: 
human cell engraftment (CD45") 12-23 weeks after transplant in the indicated 
organs. BM, bone marrow; PB, peripheral blood. Right: percentage of the 
indicated lineages within the human graft. Data from individual mice and 
mean + s.e.m. (n = 42 mice; 6 independent experiments on 13 cord blood 
donors). b, Time course of human GFP* cells in peripheral blood (PB) of mice. 
Dashed lines indicate mice in which GFP™ cells were no longer detectable 
(<0.1%) 12 weeks after transplant. c, GEP™ cells within the human graft in the 
indicated organs (left) and lineages (right) (n = 18). d, GEP* cells within 
human primitive (CD34* CD38") or committed (CD34* CD38") progenitors 
or differentiated cells (CD34 CD38") in mouse bone marrow. e, PCR analysis 
for targeted integration into AAVS1 on human lymphoid (CD19*) and 
myeloid (CD33” and CD13") cells sorted from the mice and on GFP* CFCs 
from mouse bone marrow. f, Representative images of GFP* colonies. Scale 
bars, 0.5mm. g, NHEJ at AAVS1 or IL2RG ZEN target sites on total bone 
marrow cells from panel a. n = 25, 3 independent experiments. 


the gene targeting procedure to the third day of culture (Fig. 3b). At this 
time, the cells are also likely to become more permissive to lentiviral 
vector transduction. Because increasing the length of time in culture 
promotes differentiation, we added the aryl hydrocarbon receptor ant- 
agonist (StemRegenin 1, SR1)”° and/or 16,16-dimethyl-prostaglandin 
E2 (dmPGE2)’””* to the culture to better preserve stem and early progen- 
itor cells (Fig. 3c). The delayed protocol significantly increased (=2-fold) 
the percentage of GFP* cells in primitive cells (Fig. 3a, right panels). 
SRI slightly reduced the percentage of GFP* cells observed within each 
subpopulation but increased the yield of GFP* CFCs and early progen- 
itors consistently with the increased proportion of immature cells in 
SR1-treated cultures (Fig. 3d, e and Extended Data Fig. 3e). The addi- 
tion of dmPGE2 increased the percentage of GFP cells in all subpopu- 
lations when used alone and showed additive effects with SR1. Notably, 
both the delayed treatment and the addition of SR1 and dmPGE2 
increased the fraction of mice showing long-term engraftment with 
GEP* cells, which reached 100% when used in combination (Fig. 3f). 
Human cell engraftment significantly increased after addition of SR1 to 
the culture (Fig. 3g) and was stable long-term (Extended Data Fig. 4a, b). 
Consistent with the increased GFP marking observed in vitro in the pri- 
mitive cells, the mean percentage of GFP* cells showing long-term 
engraftment in vivo increased with all types of delayed treatments (Fig. 3h). 
GFP* cells contributed to multiple lineages and to the progenitor com- 
partment in most mice (Extended Data Fig. 4c, d). Molecular analyses 
on bone marrow cells showed evidence of targeted integration (Extended 
Data Fig. 4e). Serial transplant of purified CD34" cells from the bone 
marrow of primary mice showed engraftment and differentiation of 
targeted GFP™ cells in secondary recipients (Extended Data Fig. 4f, g). 
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Figure 3 | Gene targeting in primitive versus committed progenitors. 

a, GFP* cells within the indicated subpopulations 3 days after treatment for 
targeted integration. The left-most panel shows results using the protocol 
described in Fig. 1b. The other panels show the effect of longer pre-stimulation 
and/or addition of the indicated drugs, as shown in the schematic in panel 

b. Means + s.e.m. (n = 31, 15, 14, 15, 7, 5 respectively on 37 total cord blood 
donors). *P < 0.05; ***P < 0.001 (one-way ANOVA). c, Composition of 
CD34" cells cultured with or without SR1; subpopulations as in a. 

Means + s.e.m. (n = 4). d, Total (left) and GFP™ (right) colonies from CD34~ 
cells treated for targeted integration with or without SR1. Means + s.e.m. 


Overall, these data indicate that by tailoring experimental conditions 
we could improve the yield and frequency of targeted long-term SRCs. 


Targeted gene editing of IL2RG in HSPCs 


In the experiments described in Fig. 3g, h, the gene targeting construct 
was designed to insert a cDNA comprising exons 5-8 of IL2RG together 
with the GFP cassette into the IL2RG gene and used on CD34" cells from 
healthy male donors (Fig. 4a). In this way, the cDNA is transcribed from 
the endogenous IL2RG promoter and spliced to its upstream exons, thus 
providing a platform for correcting all SCID-X1-causing mutations down- 
stream of exon 4. To assess functional reconstitution of the targeted 
gene, we challenged the repopulated mice with a human tumour cell 
line (MDA-MB 231) engineered to express human interleukin (IL)-7, 
IL-15 and granulocyte-macrophage colony-stimulating factor (GM- 
CSF) (Fig. 4b). We previously reported that this challenge leads to im- 
proved reconstitution of functional human T and natural killer (NK) 
cells that eventually reject the tumour graft”. T and NK cells are strictly 
dependent on IL-2RG expression for survival and activity and are absent 
in SCID-X1. On tumour challenge, we observed a massive (mean 130 + 
40-fold) expansion of the human T and NK lineages in the repopulated 
mice (Extended Data Fig. 5). GFP* T and NK cells expressed IL-2RG on 
the cell surface (Fig. 4c) and expanded similarly to their GFP-negative 
counterparts in all mice (Fig. 4d). All repopulated mice effectively re- 
jected the allogeneic tumour, at variance with non-transplanted mice, 
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(n = 20, 14). e, Yield of GFP“ early progenitors relative to that obtained using 
the original protocol of Fig. 1b. Means + s.e.m. (n = 8, 7, 11, 10, 3, 5) 

**P < 0.01 (one-way ANOVA). f, Percentage of NSG mice harbouring GEP* 
cells 14 weeks after transplant of CD34* cells treated with the indicated 
targeted integration protocols. g, Time course of human engraftment in 
peripheral blood. Means + s.e.m. (24h SR1, n = 5; 48h SR1, n = 6; 48h, n = 5) 
*%* P< 0.0001, ***P < 0.001 (two-way ANOVA). h, GFP* cells within 
CD45" cells in peripheral blood 14 weeks after transplant. Means + s.e.m. 

(n = 4). Mice for the 24h SR1 condition are shown for comparison from 
Fig. 2c. 


underscoring the development of a functional human immune system 
(Fig. 4e). Whereas myeloid cells sorted from the mice showed high levels 
of NHEJ at the targeted IL2RG site, comparable to those observed in the 
CD34* cells pre-transplant, B cells showed very little NHEJ, and T and 
NK cells showed virtually no NHEJ (Fig. 4f). These findings reflect the 
marked counter selection of lymphoid cells carrying a disrupted IL2RG, 
as it naturally occurs with inherited SCID-X1 alleles, and confirm the 
functionality of the reconstituted gene in the expanded GFP” cells. We 
then assessed the T-cell-receptor repertoire of lymphocytes from the 
engrafted mice and found substantial diversity with almost overlapping 
polyclonal pattern between the GEP* and GFP™ sorted cell subsets 
(Fig. 4g and Extended Data Fig. 6). The GEP* andGEP’ T cells expanded 
ex vivo after polyclonal stimulation with the same kinetics in the pres- 
ence of y-chain-dependent cytokines (IL-7 and IL-15), and prolifer- 
ated to a similar extent in response to the allogeneic MDA-MB 231 cells 
(Fig. 4h, i and Extended Data Fig. 7a). GEP* and GFP’ T cells were 
similarly comprised of CD8 and CD4 cell subsets, with most cells show- 
ing effector phenotypes (Extended Data Fig. 7b, c). Consistently, both 
GFP* and GFP cells robustly produced IFN-y and IL-2 after phorbol 
myristate acetate (PMA)-ionomycin stimulation or when co-cultured 
with the allogeneic tumour at different effector/target ratios (Extended 
Data Fig. 7d, e). Molecular analyses demonstrated that nearly all GFPt 
cells contained targeted integration into IL2RG (Fig. 4j). We then mea- 
sured the phosphorylation of two downstream effectors in the signalling 
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Figure 4 | Functional reconstitution of IL2RG in the lymphoid progeny of 
HSCs. a, Schematic of the IL2RG donor: a promoter-less IL2RG cDNA, 
comprising exons 5-8 plus 3’ untranslated region (UTR), and a PGK-GFP 
cassette are flanked by homologous sequences to those surrounding the IL2RG 
ZEN target site. SA, splice acceptor; SD, splice donor. b, Flow chart of cell 
transplantation, tumour challenge and analyses. c, Density plots of y-chain- 
expressing T (top) and NK (bottom) cells showing GFP marking. (” = 7, 11). 
d, Expansion of GFP and GFP* T and NK cells after tumour challenge. 

e, Tumour weight 3 weeks after challenge, in mice transplanted (m = 16) or not 
(n = 3). ****P < 0.0001 (unpaired t-test). f, NHEJ in the IL2RG gene on 
CD34" cells cultured in vitro and on their progeny sorted from the 
transplanted mice. g, TCR complexity score calculated on GEP* or GFP” T 


cascade of y-chain-coupled receptors (Fig. 4k and Extended Data Fig. 8). 
The targeted T cells displayed similar kinetics and extent of phosphor- 
ylation of STAT5 and AKT as their GFP counterparts after stimulation 
with increasing IL-15 and IL-2 doses. Overall, these data prove func- 
tional reconstitution of the edited IL2RG gene, which supported lym- 
phopoiesis and mature T-cell function indistinguishably from the 
wild-type allele. 


Specificity of IL2RG ZFNs on the HSC genome 

We previously performed a genome-wide screening in K562 cells to 
identify potential off-target sites of the IL2RG ZFNs used in this study’® 
and found a low rate of indels in up to 12 genomic loci bearing homo- 
logy to the intended target site. We then determined whether these sites 
were also affected in the HSPCs treated here with ZFNs containing the 
same IL2RG DNA-binding domains but coupled to improved obligate 
heterodimeric FokI domains. We deep-sequenced the genomic regions 
encompassing the identified potential target sites on treated CD34" cells 
cultured in vitro and on human cells from the bone marrow of long- 
term engrafted NSG mice (Table 1 and Extended Data Fig. 9). The in- 
tended IL2RG target site showed 45-61% indel rate in the in vitro 
cultured cells and 20-43% in the in vivo engrafted cells. However, we 
detected indels only in two in vitro samples (at 0.17-0.7%) for the top two 
previously identified off-target sites, whereas from the in vivo samples 
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cells from transplanted mice. Human peripheral blood mononuclear cells 
(PBMCs) were used as positive control. ****P < 0.0001 (one-way ANOVA). 
h, Ex vivo growth of GFP* and GFP~ T cells from the spleen of transplanted 
mice on stimulation (n = 4). i, Division index of GEP* or GFP T cells 7 days 
after PHA stimulation or co-culture with tumour cells at the indicated effector- 
to-target (E/T) ratios. P = NS (not significant) (unpaired t-test). T cells from 
healthy donor (HD) were used as controls. j, Southern blot (top), PCR (middle) 
and GFP qPCR (bottom) analyses showing targeted integration of the 
corrective IL2RG cDNA in sorted GFP* T cells from h. UT, untreated cells. 
k, Heat map showing changes in phosphorylation levels of STAT5 after the 
indicated time of exposure to decreasing amounts of IL-2 or IL-15, on T cells 
from h. P = NS (two-way ANOVA). 


we found just one site with evidence of NHEJ (at 0.02%). Deep sequen- 
cing of all other putative off-target sites gave results not statistically 
different from the background error rate, which limits detection at 
0.01% in our analysis (see Supplementary Information). The elimina- 
tion of detectable off-target activity at some previously identified sites 
is consistent with the adoption of obligate heterodimeric FokI variants 
in this study, which would detarget activity from sites requiring ZFN 
homodimers. This analysis demonstrates a high specificity for the ZFNs 
used, evidenced by the 100-fold ratio between activities at the intended 
target site versus the top identified off-target site. It is possible, however, 
that additional off-target sites exist, which have not been identified by our 
previous screening. 


IL2RG gene correction in SCID-X1 HSPCs 

We applied the optimized protocols developed for umbilical cord blood 
to CD34" cells from adult bone marrow and obtained an overall gene- 
targeting efficiency of 6 + 0.5% (n = 4 donors) and a high rate of indels 
induced at either the AA VS1 or IL2RG ZEN target sites (Fig. 5a). Targeting 
was less efficient in the more primitive populations, although reaching 
similar values as those observed for umbilical cord blood cells. Xeno- 
transplantation proved the long-term multilineage repopulation capa- 
city of the targeted cells, with all transplanted NSG mice bearing GEP* 
cells at frequencies comparable to those observed with umbilical cord 
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Table 1 | Target specificity of IL2RG ZFNs in HSPCs 
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Nearest RefSeq gene Intron/Exon Homology (%) ZFN dimer NHEJ (%) 
In vitro Engrafted in mouse 
A B G B2 co E2 
IL2RG Exon 100 L_5_R 54.60 61.18 45.60 26.08 43.51 20.07 
SCARB1 Outside 70.8 L_.5_R 0.17 0.70 NS S NS NS 
SLC31A1 Intron 75 R_5_L 0.61 NS NS NS 0.02 NS 
FAM133B Intron 66.7 R_6_R NS NS S S NS NS 
KIAA0528 Intron 87.5 L5.L NS NS iS) S NS NS 
SF3B1 Outside 66.7 L5_L NS NS NS NS NS NS 
A2BP1 Outside 75 LL5_R NS NS S S S NS 
ANKFY1 Exon 87.5 L.3_R NS NS NS NS NS NS 
TRIM43 Outside 91.7 L4L NS NS S S S NS 
SEC16A Exon 70.8 R_6_L NS NS NS S NS NS 
The on-target and candidate off-target sites of /L2RG ZFNs were deep sequenced in the indicated progeny of treated CD34* cells. Intron/exon indicates whether the ZFN target site is within an exon, intron or 
intergenic (outside) near to the RefSeq gene indicated in the first column. Homology indicates the percentage of sequence identity to the /L2RG ZFN binding sites. ZFN dimer indicates the site can be bound bya 
homodimeric (LL/RR) or heterodimeric (LR/RL) ZFN pair; the number indicates the spacer length in base pairs between the ZFN-binding sites. NHEJ (%) indicates the percentage of indels on treated CD34* cells 


cultured in vitro (samples A, B and G) and on human cells harvested from the bone marrow of long-term engrafted NSG mice from Fig. 3 (mice B2, CO and E2). For quantification see Supplementary Information and 


Extended Data Fig. 9. NS, not significant (Fisher's exact test for contingency data). 


blood cells (Fig. 5b). On the basis of these results, we then tested our 
gene correction strategy on bone marrow CD34" cells from a symp- 
tomatic four-month-old SCID-X1 patient bearing a missense mutation 
in IL2RG exon 7 (c.865C>T; R289X). As expected for this mutation, 
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Figure 5 | Targeted integration and IL2RG gene correction in bone- 
marrow-derived CD34* cells from healthy donors and a subject with SCID- 
X1. a, Top: GFP* cells within the indicated subpopulations derived from bone 
marrow CD34" cells of adult healthy donors, treated for targeted integration 
according to the best performing protocol from Fig. 3. Bottom: NHEJ at the 
ZEN target site on total cells. Means + s.e.m., (1 = 10, 3 from 4, 3 donors for 
AAVS1 or IL2RG, respectively). b, Top left: human cells in peripheral blood of 
NSG mice transplanted with cells from a. Top right: percentages of the 
indicated lineages within human cells 15 weeks after transplant. Bottom: GFP 
cells within the indicated populations. c, GEP* cells measured as in a in bone 
marrow CD34" cells from a subject with SCID-X1 treated for IL2RG gene 
correction. d, IL-2RG expression in myeloid (CD33") cells from a GEP* 
colony from the cells treated in c or from pooled wild-type colonies. e, PCR 
analysis for targeted integration into IL2RG of the corrective cDNA on cells 
from cand d. f, Expression of the fusion transcript bearing the corrective IL2RG 
cDNA measured by qPCR (top) or RT-PCR (bottom) on cDNA from a GEP* 
SCID-X1 myeloid colony. IL2RG targeted T cells from engrafted mice analysed 
in Fig. 4j were used as positive control, whereas a myeloid colony from wild- 
type bone marrow cells (WT) and PBMCs were used as negative controls. 


blood sampling or bone marrow harvest from the patient did not show 
any T or NK cells (Extended Data Fig. 10a, b). From 3% to 11% of the 
treated cell progeny became GFP*, depending on primitive versus 
committed progenitor status (Fig. 5c). CFC assays yielded three GFP 
colonies out of ~100 scored (Extended Data Fig. 10c). Flow cytometry 
showed normal expression of the y-chain protein in the myeloid pro- 
geny of the GEP * CFCs (Fig. 5d). Polymerase chain reaction (PCR) ana- 
lyses of these colonies proved targeted integration into IL2RG leading to 
expression of the expected fusion transcript between the corrective 
cDNA and the upstream endogenous exons (Fig. 5e, fand Supplemen- 
tary Information). Overall, these data show reconstitution of a func- 
tional IL2RG gene on targeted editing of a SCID-X1 allele in HSPCs. 


Discussion 


Here we developed a strategy for targeted genome editing in human 
long-term repopulating HSCs and exploited it to insert transgenes 
into a genomic safe harbour or downstream of the promoter of an endo- 
genous gene to reconstitute its functional expression. Using the latter 
approach, we demonstrate correction of the defective IL2RG gene in 
HSPCs from a subject with SCID-X1. As we obtained consistent results 
when targeting two different loci, we expect that our genome editing 
strategy can be used to target a variety of genomic sites. As the pro- 
cedure uses a combination of IDLV infection and mRNA electropora- 
tion to deliver donor template and ZFNs, it has the potential to be 
applicable to a range of other genome editing tools. 

We found that primitive cells are more sensitive than committed 
progenitors to the cytotoxicity of the gene targeting procedure and less 
proficient at performing HDR, probably because of their quiescence or 
slow cycling. These findings are consistent with reports showing delayed 
DNA repair and enhanced apoptosis after y-radiation in human HSCs 
as compared to progenitors* and preferential repair of DNA DSBs by 
NHEJ in quiescent murine HSCs’. By delaying the time of treatment 
and exploiting compounds reported to support ex vivo maintenance 
and expansion of HSCs”**’, we were able to partially relieve the block 
to HDR. This effect is probably due to an increased transit through the 
S/G2 phases of the cell cycle, when HDR can occur, and, possibly, upregu- 
lation of its endogenous machinery. Other beneficial effects might be 
increased permissiveness to gene delivery, more efficient mRNA trans- 
lation and reduced growth arrest and apoptosis in response to the gene 
targeting procedure. As improved procedures for ex vivo HSC expan- 
sion become available, they might increase the yield of gene targeted 
cells and allow their selection before in vivo administration. This would 
enable wide application of safe harbour sites, such as AA VS1, for robust 
expression of therapeutic transgenes”. 

When applied to the functional correction of IL2RG in HSCs, we 
show that our strategy is compatible with normal development of the 
lymphoid lineages. IL2RG-edited lymphoid cells repopulated the mice 
and responded to y-chain-dependent cytokines indistinguishably from 
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their unedited counterparts. On the contrary, lymphoid cells carrying 
NHE)J-mediated IL2RG inactivation were counter-selected in the mice, 
phenocopying the SCID-X1 disease’. 

This disease may be suitable for the clinical translation of targeted 
gene correction, as previous clinical studies have demonstrated the 
potential efficacy but also the risks of HSC gene therapy using early 
generation vectors”*. Whereas our strategy should abrogate the risk of 
insertional mutagenesis and the concern for unregulated transgene 
expression, it remains to be established whether a potentially limiting 
amount of targeted HSPCs would enable effective and safe correction 
of SCID-X1 in humans (even though both HSCs and progenitors, which 
are more efficiently targeted, would help repopulating the absent lym- 
phoid lineages). Another concern is with the potential off-target activity 
of ZFNs, although our analysis showed a high specificity for the IL2RG 
ZENs used in this study. Overall, we envisage a fairly straightforward 
path to clinical translation of our strategy, considering that good man- 
ufacturing practice of lentiviral vector and mRNA electroporation are 
already established. Moreover, infusion of autologous gene-targeted cells 
could be performed without pharmacological conditioning, taking advan- 
tage of the selective advantage of the corrected cells. 


METHODS SUMMARY 


Donor IDLV for HDR were generated as described. ZFNs targeting intron 1 of 
PPPIRI12C or exon 5 of IL2RG (refs 16, 22) were expressed by mRNA electro- 
poration. CD34" cells from human umbilical cord blood or bone marrow were 
used on approval by the San Raffaele Hospital Bioethical Committee, stimulated 
in serum-free medium with early acting cytokines, infected with IDLVs at a mul- 
tiplicity of infection (MOI) 100-500, and then electroporated with 175 1g ml”! 
ZENs encoding mRNAs. Targeted integration was assessed by PCR and Southern 
blot while ZFN activity was determined by Cell assay and deep sequencing of 
genomic target sites. The treated CD34“ cells at day 4 of culture were infused intra- 
venously into sublethally irradiated 8-11-week-old NOD-SCID-Il2rg ~ (NSG) 
mice. To expand human T and NK cells, 4 x 10° MDA3-MB231 tumour cells ex- 
pressing human IL-7, IL-15 and GM-CSF were implanted orthotopically in the 
mammary fat pad of NSG mice 14 weeks after CD34 cells transplantation”. 
Functional assays on IL2RG-edited T cells collected from transplanted mice were 
carried out after stimulation with beads conjugated to anti-human CD3 and CD28 
antibodies and sorting for GFP expression. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 


Received 2 August 2013; accepted 29 April 2014. 
Published online 28 May 2014. 


1. Mukherjee, S. & Thrasher, A. J. Gene therapy for PIDs: progress, pitfalls and 
prospects. Gene 525, 174-181 (2013). 

2. Aiuti, A. et al. Lentiviral hematopoietic stem cell gene therapy in patients with 
Wiskott-Aldrich syndrome. Science 341, 1233151 (2013). 

3. Cavazzana-Calvo, M. etal. Transfusion independence and HMGA2 activation after 
gene therapy of human B-thalassaemia. Nature 467, 318-322 (2010). 

4. Cartier, N. et al. Hematopoietic stem cell gene therapy with a lentiviral vector in 
X-linked adrenoleukodystrophy. Science 326, 818-823 (2009). 

5.  Biffi, A. et al. Lentiviral hematopoietic stem cell gene therapy benefits 
metachromatic leukodystrophy. Science 341, 1233158 (2013). 

6. Naldini, L. Ex vivo gene transfer and correction for cell-based therapies. Nature Rev. 
Genet 12, 301-315 (2011). 

7. Braun, C. J. et al. Gene therapy for Wiskott-Aldrich syndromeong-term efficacy 
and genotoxicity. Science Transl. Med. 6, 227ra33 (2014). 

8. Cavazza, A, Moiani, A. & Mavilio, F. Mechanisms of retroviral integration and 
mutagenesis. Hum. Gene Ther. 24, 119-131 (2013). 

9. Woods, N.B., Bottero, V., Schmidt, M., von Kalle, C. & Verma, |. M. Gene therapy: 
therapeutic gene causing lymphoma. Nature 440, 1123 (2006). 

10. Gentner, B. et al. Identification of hematopoietic stem cell-specific miRNAs 
enables gene therapy of globoid cell leukodystrophy. Sci. Trans/. Med. 2, 58ra84 
(2010). 


240 | NATURE | VOL 510 | 12 JUNE 2014 


11. Urnov,F.D., Rebar, E.J., Holmes, M.C., Zhang, H.S. & Gregory, P. D. Genome editing 
with engineered zinc finger nucleases. Natl. Rev. 11, 636-646 (2010). 

12. Joung, J. K. & Sander, J. D. TALENs: a widely applicable technology for targeted 
genome editing. Nature Rev. Mol. Cell Biol. 14, 49-55 (2013). 

13. Sander, J. D. & Joung, J. K. CRISPR-Cas systems for editing, regulating and 
targeting genomes. Nature Biotechnol. 32, 347-355 (2014). 

14. Lombardo, A. et al. Gene editing in human stem cells using zinc finger nucleases 
and integrase-defective lentiviral vector delivery. Nature Biotechnol. 25, 
1298-1306 (2007). 

15. Urnov, F. D. et al. Highly efficient endogenous human gene correction using 
designed zinc-finger nucleases. Nature 435, 646-651 (2005). 

16. Gabriel, R. et al. An unbiased genome-wide analysis of zinc-finger nuclease 
specificity. Nature Biotechnol. 29, 816-823 (2011). 

17. Mussolino, C. et a/. A novel TALE nuclease scaffold enables high genome 
editing activity in combination with low toxicity. Nucleic Acids Res. 39, 9283-9293 
(2011). 

18. Ciccia, A. & Elledge, S. J. The DNA damage response: making it safe to play with 
knives. Mol. Cell 40, 179-204 (2010). 

19. Tebas, P. et al. Gene editing of CCR5 in autologous CD4 T cells of persons infected 
with HIV. N. Engl. J. Med. 370, 901-910 (2014). 

20. Holt, N. etal. Human hematopoietic stem/progenitor cells modified by zinc-finger 
nucleases targeted to CCR5 control HIV-1 in vivo. Nature Biotechnol. 28, 839-847 
(2010). 

21. Provasi, E. etal. Editing T cell specificity towards leukemia by zinc finger nucleases 

and lentiviral gene transfer. Nature Med. 18, 807-815 (2012). 

22. Lombardo, A. et al. Site-specific integration and tailoring of cassette design for 

sustainable gene transfer. Nature Methods 8, 861-869 (2011). 

23. Zou, J. et al. Oxidase-deficient neutrophils from X-linked chronic granulomatous 

disease iPS cells: functional correction by zinc finger nuclease-mediated safe 

harbor targeting. Blood 117, 5561-5572 (2011). 

24. Li, H. etal. In vivo genome editing restores haemostasis in a mouse model of 

haemophilia. Nature 475, 217-221 (2011). 

25. Doulatov, S., Notta, F., Laurenti, E. & Dick, J. E. Hematopoiesis: a human 

perspective. Cell Stem Cell 10, 120-136 (2012). 

26. Boitano, A. E. et al. Aryl hydrocarbon receptor antagonists promote the expansion 

of human hematopoietic stem cells. Science 329, 1345-1348 (2010). 

27. North, T. E. et al. Prostaglandin E2 regulates vertebrate haematopoietic stem cell 

homeostasis. Nature 447, 1007-1011 (2007). 

28. Goessling, W. et al. Prostaglandin E2 enhances human cord blood stem cell 

xenotransplants and shows long-term safety in preclinical nonhuman primate 

ransplant models. Cell Stem Cell 8, 445-458 (2011). 

29. Escobar, G. et al. Genetic engineering of hematopoiesis for targeted IFN-alpha 

delivery inhibits breast cancer progression. Sci. Transl. Med. 6, 217ra213 

(2014). 

30. Milyavsky, M. et al. A distinctive DNA damage response in human hematopoietic 

stem cells reveals an apoptosis-independent role for p53 in self-renewal. Cel! Stem 

Cell 7, 186-197 (2010). 

31. Mohrin, M. et al. Hematopoietic stem cell quiescence promotes error-prone DNA 
repair and mutagenesis. Cell Stem Cell 7, 174-185 (2010). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank D. Weissman for advice on mRNA production and the 
whole Naldini laboratory for discussion, F. Benedicenti for help with MiSeq sequencing, 
L. Sergi Sergi, T. Plati, V. Valtolina, B. Camisa and A. Ranghetti for technical help. SR1 
was provided by T. Boitano and M. Cooke under an MTA with the Genomics Institute of 
the Novartis Research Foundation. This work was supported by grants to L.N. from 
Telethon (TIGET grant D2) EU (FP7 222878 PERSIST, FP7 601958 SUPERSIST, ERC 
Advanced Grant 249845 TARGETINGGENETHERAPY) and the Italian Ministry of 
Health. 


Author Contributions P.G. designed experiments, performed research, interpreted 
data and wrote the manuscript. G.S. and G.E. performed research and interpreted data. 
T.D.T. performed mRNA production. C.F. characterized the corrective cDNA. A.C. and 
E.M. performed bioinformatics analysis of ZFN specificity. RM. and D.M. developed the 
NSG human tumour rejection model. C.B. contributed to the T-cell studies. M.v.d.B. 
provided SCID-X1 patient cells. M.C.H. and P.D.G. provided ZFNs, interpreted data and 
edited the manuscript. B.G. set up culture conditions for HSC maintenance. A.L. and 
LN. designed and supervised research, interpreted data and wrote the manuscript. L.N. 
coordinated the study. G.S. and G.E. contributed equally to this work. A.L. and L.N. share 
senior authorship. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare competing financial interests: details 
are available in the online version of the paper. Readers are welcome to comment on 
the online version of the paper. Correspondence and requests for materials should be 
addressed to LN. (naldini.luigi@hsr.it). 


©2014 Macmillan Publishers Limited. All rights reserved 


METHODS 


Vectors and zinc finger nucleases. Homology-directed repair donor templates 
were generated from HIV-derived, third-generation self-inactivating transfer con- 
structs. IDLV stocks were prepared as previously described“ and titred by a qPCR 
designed to discriminate the reverse-transcribed vector genome from plasmid car- 
ried over from transient transfection**. Sequence and maps of AA VS1-PGK.GFP 
were previously reported”’, whereas IL2RG-cDNA.PGK.GEFP is described in detail 
elsewhere (Firrito et al., manuscript in preparation). ZFNs that target intron 1 of 
PPP1R12C or exon 5 of IL2RG were previously described'*'*****. The latter pair was 
modified to contain high-fidelity obligate heterodimeric FokI variants**. Both pairs 
of ZFNs were transiently expressed as mRNAs. Plasmid templates for ZFNs mRNA 
production (described in Extended Data Fig. 1b) were linearized and purified by 
phenol/chloroform extraction followed by ethanol DNA precipitation. 2 ug per 
reaction of linearized plasmid template was in vitro transcribed at 37°C for 2h 
using T7 RNA polymerase and 7.5 mM nucleotide triphosphates (MEGAscript kit; 
Ambion). Cap0 mRNAs were generated by supplementing the reactions with 
6mM m7(3’-O-methyl)-G(5’)ppp(5’)G, a non-reversible cap analogue (ARCA, 
New England Biolabs) and lowering the concentration of GTP to 1.5 mM. After 
TURBO DNase treatment (4 U per reaction, 1 h at 37 °C), mRNAs were poly(A) 
tailed with E. coli Poly(A) Polymerase (8 U per reaction) for 1h at 37°C (PolyA 
tailing kit; Ambion), yielding =150 nt polyA. Transcripts were purified by the 
RNeasy Plus Mini Kit (Qiagen). All RNA samples were analysed by denaturing 
agarose gel electrophoresis for quality assurance. 

In vitro culture and assays on human cord blood or bone-marrow-derived 
CD34* cells. CD34" cells were either freshly purified from human cord blood 
after obtaining informed consent and on approval by the San Raffaele Hospital 
Bioethical Committee, or purchased frozen from Lonza. 10° CD34* cells per ml 
were stimulated in serum-free StemSpan medium (StemCell Technologies) sup- 
plemented with penicillin, streptomycin and human early-acting cytokines (for 
cord-blood-derived cells: stem cell factor (SCF) 100 ng ml 1 Elt3 ligand (Flt3-L) 
100 ng ml’, thrombopoietin (TPO) 20ng ml *, and interleukin 6 (IL-6) 20 ng 
ml |; for bone-marrow-derived cells: SCF 300 ngml- ' Flt3-L 300 ngml- | TPO 
100 ng ml~ 1 and IL-6 60 ng ml}; all purchased from Peprotech) for 24 or 48h 
and then infected with IDLVs at a multiplicity of infection (MOI) 100-500. The 
following day the cells were electroporated with 175 ig ml~* ZENs encoding 
mRNAs (P3 Primary Cell 4D-Nucleofector X Kit, program EO-100; Lonza). 
For some experiments, the following drugs were supplemented to the culture 
media: 1 1M SRI (provided by T. Boitano and M. Cooke, GNF) added at every 
medium change, and 10 14M dmPGE2 (Cayman) added at the beginning of the 
culture, 1 h before and just after electroporation. For CFC assays, 800 cells per 
plate were seeded one day after electroporation in methylcellulose-based medium 
(MethoCult H4434, StemCell Technologies). Two weeks after plating, colonies 
were counted and identified according to morphological criteria. 

Flow cytometry. For immunophenotypic analysis of CD34” cells and their pro- 
geny (performed on FACSCanto IJ; BD Pharmingen), we used the antibodies 
reported in Supplementary Information. Single-stained and FMO-stained cells 
were used as controls. For quantitative flow cytometry we used Flow-count 
Fluorospheres (Beckman Coulter) according to the manufacturers’ instructions. 
Apoptosis analysis was performed on CD34" cells one day after electroporation 
using peripheral-blood-conjugated Annexin V (Biolegend) and Apoptosis Detec- 
tion kit with 7-Aminoactinomycin D (7AAD, BD Pharmingen) according to the 
manufacturers’ instructions. Percentages of live (JAAD , AnnexinV_), early 
apoptotic (7AAD, AnnexinV~), late apoptotic (7AAD~, AnnexinV“ ) and nec- 
rotic (7AAD*, AnnexinV_ ) cells are reported. Cell sorting was performed using 
MoFlo XDP Cell Sorter (Beckman Coulter). 

Molecular analyses. For molecular analyses, genomic DNA was isolated with 
DNeasy Blood & Tissue Kit or QlAamp DNA Micro Kit (QIAGEN). Extraction of 
genomic DNA from colonies in CFC assays was performed with Lysis Buffer as 
previously described*. NHEJ in AAVS1 locus or IL2RG gene was detected by the 
mismatch selective Cell assay as previously described’. Primers for PCR ampli- 
fications to detect targeted integration or for the Cell assay are indicated in 
Supplementary Information. PCR amplicons were resolved on agarose gel and 
visualized by ethydium bromide staining. For Southern blot analyses, genomic 
DNA was extracted with Blood & Cell Culture DNA Midi Kit (QIAGEN) and 
digested using restriction enzymes (BglI for AAVS1 locus and BspHI for IL2RG) 
and probes (see Supplementary Information) both located outside of the homo- 
logy regions included in the vectors. Matched DNA amounts were separated on 
1% agarose, transferred to a nylon membrane and probed with *”P-radiolabelled 
sequences indicated in Supplementary Information. Membranes were exposed in 
a Storage Phosphor Screen. For qPCR analysis, 200 ng of genomic DNA were 
analysed using primers and probes complementary to a vector backbone 
sequence (Primer Binding Site), the GFP sequence and human TERT, the latter 
amplification used as normalizer, as previously described’. For gene expression 
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analysis on the SCID-X1 gene corrected colony, mRNA was extracted using the 
RNeasy Micro Kit (QIAGEN) and cDNA was synthetized using the SuperScript 
VILO cDNA Synthesis Kit (Invitrogen). The resulting cDNA was amplified 
before qPCR by Taqman PreAmp Master Mix Kit (Applied Biosystems) accord- 
ing to the manufacturers’ instructions. Gene expression was performed in trip- 
licate with a TaqMan Gene Expression assay specific for the recoded exon 7 of the 
IL2RG gene (Applied Biosystems; see Supplementary Information) in a 7900HT 
real-time PCR thermal cycler. The relative expression level of the recoded IL2RG 
gene was calculated by the AACt method and represented as fold change relative 
to the housekeeping gene control (HPRT), as previously described”. 

Mice transplantation and analysis. For the in vivo studies, 8- to 11-week-old 
NOD-SCID-Il2rg '~ (NSG) mice were purchased by Jackson laboratory. The 
experimental protocol was approved by the Institutional Animal Care and Use 
Committee of the San Raffaele Scientific Institute. At day 4 of culture, 3 10° 
gene targeted cord-blood-derived CD34" cells (or 7.5 X 10° bone-marrow- 
derived cells) were infused intravenously into the mice after sub-lethal irradiation 
(200 cGy). Sample size was determined by the total number of available treated 
cells. Mice were attributed to each experimental group randomly. MDA3 human 
mammary carcinoma cell line was obtained by stable transduction of MDA-MB 
231 cells with lentiviral vectors expressing the human cytokines GM-CSF, IL-7 
and IL-15 from the PGK promoter, as previously described”’. 4 X 10° MDA3 cells 
were implanted orthotopically in the mammary fat pad of NSG mice 14 weeks 
after CD34" cell transplantation or in age-matched untransplanted NSG mice. 
Human CD45* engraftment was followed by serial collections of blood from the 
mouse tail and, at the end of the experiment (12-23 weeks after transplantation), 
bone marrow and spleen were harvested and analysed for lineage composition 
and GFP content (see Supplementary Information for gating strategy). 

T lymphocyte analyses. Human T cells were enriched from splenocytes harvested 
from the transplanted mice using magnetic beads conjugated to anti-human CD3 
and CD28 antibodies (Dynabeads human T-activator CD3/CD28; Invitrogen), fol- 
lowing the manufacturers’ instructions, and grown in Iscove’s Modified Dulbecco’s 
Media (IMDM) (GIBCO-BRL) supplemented with penicillin, streptomycin, 10% 
FBS and 5 ngml? each of IL-7 and IL-15 (PeproTech)”’. 

For TCR V-B repertoire analysis, mRNA was extracted from the expanded T 
cells using the RNeasy Mini Kit (QIAGEN) and cDNA was synthesized using the 
SuperScript VILO cDNA Synthesis Kit (Invitrogen). Multiplex PCRs optimized 
from a previous work” were carried out on cDNA using V-f primers specific for 4 
or 5 different families and a single FAM-labelled C-B primer. PCR products were 
fractionated on 6% polyacrylamide gel, visualized on Molecular Dynamics Typhoon 
9410 (Amersham Biosciences) and analysed using ImageQuant TL 7.0 (Amersham 
Biosciences). V-B complexity was determined by counting the number of distinct 
peaks and graded on a score of 0-8 (ref. 36). The overall TCR complexity score was 
determined by summing up all 23 individual TCR V-B family-specific scores. 

To analyse phosphorylation of downstream effectors of the IL-2RG pathway, T 
cells were starved overnight at 37 °C in IMDM without cytokines and then stimu- 
lated with IL-2 (1,000IU ml _',100IU ml ',10IU ml}; purchased from Novartis) 
orIL-15(10ngml~',5ngml“', 1 ngml') at37 °C for increasing times. Cells were 
then fixed in PBS 2% paraformaldehyde (PFA) for 10min at 37°C, and after 
washing in PBS 0.1% BSA (3 times), they were permeabilized with ice-cold absolute 
methanol for 7 min on ice. After 60 min incubation of each time point of cytokine 
stimulation with different dilutions of Pacific blue succinimidyl ester (PBSE) (Life 
Technologies), cells were washed, pooled and stained for flow cytometry. 

For proliferation assay, 10° T cells were labelled with Cell Proliferation Dye 
eFluor 670 (eBioscience) according to the manufacturer’s instructions. Labelled T 
cells were co-cultured in IMDM supplemented with penicillin, streptomycin, 10% 
FBS and 5ng ml’ each of IL-7 and IL-15, with different dilutions of MDA-MB 
231 cells previously irradiated at 10,000 rad or stimulated for 3 days with PHA 
(2 pg ml~'). After 7 days of culture, cells were analysed by flow cytometry. Divi- 
sion index was calculated according to FlowJo software rules. 

For IFN-y release assay, T cells were stimulated at 37 °C for 6h with PMA 
(50ng ml’) and ionomycin (1 1g ml‘) in presence of 2 ul per ml of culture of 
BD Golgi Plug (BD Pharmingen). Cells were then fixed and permeabilized using 
the BD Cytofix/Cytoperm Kit (BD Pharmingen) and stained for flow cytometry. 

Elispot assay for IFN-y release was performed as previously described”'”’. 
When the number of measured spots was above the detection limit of the plate 
reader (Eli.Expert, A.EL.VIS.), it was arbitrarily set to 500. 

Deep sequencing of potential IL2RG ZEN off-target loci. Genomic DNA from 
ZEN-treated CD34* cells or their progeny harvested from transplanted mice was 
amplified using the REPLI-g Mini Kit (QIAGEN) and the top-ranking candidate 
off-target genomic loci from our previous study'® amplified by PCR, generating 
amplicons of 389 + 20 bp surrounding the potential ZFN binding site. PCR pro- 
ducts were purified using Agencourt AMPure XP beads (Beckman Coulter) and 
adaptors were added by TruSeq DNA LT Sample Prep Kit (Illumina). To build an 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


equimolar library, PCR products were quantified with KAPA Library Quanti- 
fication Kit for Illumina sequencing platforms (KAPABIOSYSTEMS) on C1000 
Thermal Cycler (BIO-RAD) and sequenced on MiSeq Illumina Platform using 
MiSeq Reagent v.3 (Illumina). Raw paired-end reads were joined with Fastq-Join 
program from the EA-Utils NGS suite (http://code.google.com/p/ea-utils/) and 
aligned to the specific genomic target sequences using Burrows—Wheeler Align- 
ment Tool with maximal exact match version, BWA-MEM”. Alignments were 
evaluated and filtered using SAMtools™, Picard (http://picard.sourceforge.net) 
and BAMtools*’. Sequences with only primary alignments with quality >15 were 
kept for further analysis. Deletions and insertions (indels) were quantified by a 
custom pipeline based on Python (http://www.python.org, version 2.7.6) and the 
PySAM library (https://code.google.com/p/pysam, version 0.7.5). Sequences with 
indels of =1 bp located within a region encompassing the spacer +5 bp on each 
side were considered as ZFN-induced genome modifications. Coverage statistics 
were computed by the binomial distribution online calculator (http://www. 
vassarstats.net/binomialX.html). Fisher exact test were computed with the SciPy 
Python package (http://www.scipy.org, version 0.9.0) within the ‘stats’ library. 
Multiple sequence alignment for indels visualization and plot was performed with 
ClustalW2” and MView (http://bio-mview.sourceforge.net/). 

SCID-X1 cells. Peripheral blood and bone-marrow samples from a subject with 
SCID-X1 were obtained according to the guidelines of the Medical Ethics Com- 
mittee of the Erasmus MC, University Medical Center Rotterdam, the Netherlands. 
Statistical analyses. Statistical analyses were performed by unpaired Student’s 
t-test for pairwise comparison or one-way or two-way analysis of variance 
(ANOVA) with Bonferroni’s multiple comparison post-test for three or more 
groups, as indicated. Values are expressed as mean + standard error of the mean 
(s.e.m.). Per cent values were transformed into a log-odds scale (log (%X/(100-%X)) 
to perform statistical analyses. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


39) 


40. 


41. 


42. 


43. 


Matrai, J. et al. Hepatocyte-targeted expression by integrase-defective lentiviral 
vectors induces antigen-specific tolerance in mice with low genotoxic risk. 
Hepatology 53, 1696-1707 (2011). 

Hockemeyer, D. et al. Efficient targeting of expressed and silent genes in human 
ESCs and iPSCs using zinc-finger nucleases. Nature Biotechnol. 27, 851-857 
(2009). 

Miller, J. C. et al. An improved zinc-finger nuclease architecture for highly specific 
genome editing. Nature Biotechnol. 25, 778-785 (2007). 

Akatsuka, Y., Martin, E. G., Madonik, A., Barsoukov, A. A. & Hansen, J. A. Rapid 
screening of T-cell receptor (TCR) variable gene usage by multiplex PCR: 
application for assessment of clonal composition. Tissue Antigens 53, 122-134 
(1999). 

Wu, C. J. et al. Reconstitution of T-cell receptor repertoire diversity following T-cell 
depleted allogeneic bone marrow transplantation is related to hematopoietic 
chimerism. Blood 95, 352-359 (2000). 

Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler 
transform. Bioinformatics 25, 1754-1760 (2009). 

Li, H. etal. The sequence alignment/map format and SAMtools. Bioinformatics 25, 
2078-2079 (2009). 

Barnett, D. W., Garrison, E. K., Quinlan, A. R., Stromberg, M. P. & Marth, G. T. 
BamTools: a C++ API and toolkit for analyzing and managing BAM files. 
Bioinformatics 27, 1691-1692 (2011). 

Larkin, M. A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 
2947-2948 (2007). 

Harris, D. T., Badowski, M., Balamurugan, A. & Yang, O. O. Long-term human 
immune system reconstitution in non-obese diabetic (NOD)-Rag (-)-y chain (-) 
(NRG) mice is similar but not identical to the original stem cell donor. Clin. Exp. 
Immunol. 174, 402-413 (2013). 

Gattinoni, L. et a, Ahuman memory T cell subset with stem cell-like properties. 
Nature Med. 17, 1290-1297 (2011). 

Cieri, N. etal. IL-7 and IL-15 instruct the generation of human memory stem T cells 
from naive precursors. Blood 121, 573-584 (2013). 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a b 
Kozak i Apel 
ao ene GCCACCATGGEE -ZFP. Fokl —4 
6 oS 
ZFNORF ~970 bp 
ZFN-L ZFN-R L2A.R 
3 | as 
pA - pA - pA 
—eIDLV - = - 
Zs —=— Ad5/35 3kb = 
g ae 
= —a— mRNA ~ 
2 —#— Plasmid 1 kb = -_ ional = 
ts) 
+ 
a 
6 AAVS1 ZFNs mRNA 
5’ CAP “ 17 
0123 4567 8 9 #01 23 4 5 6 8 9 meer” sede 
Time after treatments (days) Time after treatments (days) 5’ CAP ee cS 
uy 10 7 
c Viable cells (%): 841 8040.1 6141 4936 ZFN-L + ZFN-R Z 
5 
. S< : 5’ CAP oF 
6 E12 al 
S 2 ZFNL 1 AR 
= 40 ° ZFN-L.2A.ZFN-R = 
wl t ZFN-R 
Zz rs 
2 2 e 
IDLV donor, 3. ZFNs electroporation 
0. ZFNs electroporation ———" pv donor 
20 60 175 525 20 60 175 525 cD34 ——$—~ oe = 
mRNA/ml of each ZFN “a dav Fdav! 
Hg maNNmlor seen ZEN - Wdayg 3 6 12 24 hours YQ 2 14 24 hours 
d IDLV first ZFNs first 
on . 210 
100. eR 10 
-_ —s _ ~ 0.8: 
& 80 . = 8 z 
a ‘. 26 = 06 
= 60 a oO a) 
8 Tr : 24 Boa 
2 40 a 2 
2 . a & 2 302 
> io) 
20 ” 6 ™ 0.0: 
0 IDLV Plasmid 3 6 12 24 2 14 24 
UT IDLV Plasmid Gene Targeting Time after IDLV transduction (hr) Time after ZFNs electroporation (hr) 
Gene Targeting 


Extended Data Figure 1 | Optimization of gene targeting protocol in 
CD34" cells. We optimized the delivery platform, dose and timing of ZFNs 
and HDR donor template administration. a, Performance of different gene 
delivery platforms. Cord blood (CB) CD34* cells were pre-stimulated with 
early acting cytokines for 24 h and transduced with GFP-encoding IDLV (MOI 
5 X 10°) or adenoviral vector serotype 5/35 (MOI 5 x 10°), or electroporated 
with GFP-expressing mRNA (500 pig ml‘) or plasmid DNA (25 pg ml’). 
The cells were analysed by flow cytometry at the indicated days after the 
procedure. Top: representative density plots of GFP expression 24 h post 
treatment. SSC, side scatter. Bottom: kinetics of transgene expression measured 
as a percentage of GFP* cells (left) and relative GFP fluorescence intensity 
(RFI, measured as the ratio between the mean fluorescence intensity of the 
treated cells at each time point to the untreated cells) in arbitrary units (right). 
UT, untreated cells. mRNA electroporation outperformed all approaches tested 
in terms of frequency of transfected cells and protein expression level. Note that 
although IDLV infects the majority of cells in these conditions its expression is 
constrained by the unintegrated nature’. Because mRNA transfection drives a 
robust but short-lived spike of expression, it appeared best suited for ZFN 
delivery, allowing proficient activity of the nucleases at the genomic target site 
while avoiding prolonged exposure. b, Top: schematic representation (not in 
scale) of a plasmid DNA template used for in vitro mRNA transcription with 
the T7 promoter, the Kozak sequence and the XbaI restriction enzyme used for 
the plasmid linearization depicted. The protein domains of a ZFN are shown 
within the open reading frame (ORF). NLS, nuclear localization signal; ZFP, 
zinc finger protein; Fokl, FokI nuclease domain. Middle: representative 
denaturing gel electrophoresis of in vitro transcribed mRNAs encoding for 
the pair of ZFNs specific for AAVS1, before (—) and after enzymatic 
polyadenylation (pA). The ZEN mRNAs were produced either as two separated 
transcripts (ZFN-L and ZFN-R) or as a single construct coding for both ZFNs 
linked by a Tav.2A self-cleavage peptide sequence (ZFN-L.2A.ZFN-R; 
bottom left). Bottom right: cord blood CD34™ cells were electroporated either 
with the two separate transcripts or with the single mRNA co-expressing both 
ZENs. ZFN activity was measured on treated cells as percentage of NHEJ 
detected at the ZFN target site by Cell assay 10 days after electroporation. 

c, Dose-response for AAVS1 targeting ZFN mRNAs in CD34" cells. Cord 
blood CD34" cells were transduced with integrase-defective lentiviral vector 
(IDLV)"* bearing homology to the AAVS1 locus and expressing GFP, and then 
electroporated with the indicated escalating doses of AAVS1 ZFN mRNAs. 
ZEN activity was scored by measuring the extent of NHE)-mediated repair at 


their genomic target site, and HDR was scored by the frequency of GFP™ cells 
obtained in liquid culture. Left: NHEJ measured by Cell assay at day 10 after 
electroporation for the indicated dose of mRNA. Means + s.e.m. (n = 3). Right: 
percentages of GFP* cells by flow cytometry 3 days after treatment. The 
percentages of viable cells (indicated on top of the histogram) were calculated as 
percentages of 7AAD negative cells gated on singlets. A dose-dependent 
increase in the percentage of NHEJ and GFP* cells was observed for the first 
three mRNA doses, whereas the highest dose caused a significant reduction 
in the number of viable cells, and a reduction in the efficiency of gene 
targeting. Based on these data, we selected the dose of 175 ygml_' RNA to 
perform all further experiments. d, Choice of delivery platform for the HDR 
donor template. Cord blood CD34” cells were either transduced with the 
AAVS1 donor IDLV and electroporated with the cognate ZFNs mRNAs, 

or co-electroporated with AAVS1 donor plasmid DNA and ZFN mRNAs. 
Left: cell viability measured by flow cytometry 3 days after electroporation, 
comparing untreated cells (UT) and gene-targeted cells using IDLV or 
plasmid as donor templates. ****P < 0.0001 (one-way ANOVA with 
Bonferroni’s multiple comparison post-test). Right: percentage of GEP™ cells 
using either donor templates. Means + s.e.m. (UT, n = 3; IDLV, n = 18; 
plasmid, n = 10). *P < 0.05 (unpaired t-test). IDLV infection outperformed 
plasmid DNA electroporation in terms of the frequency of GFP* cells and cell 
viability, consistent with our previous findings in other primary cell types'*”. 
e, Schedule optimization for ZFNs and donor template delivery. After one day 
of pre-stimulation, cord blood CD34" cells were first transduced with the 
AAVS1 donor IDLV and then electroporated at the indicated hours post- 
infection with ZFN mRNAs (left) or, on the contrary, first electroporated with 
ZEN mRNAs and then transduced with IDLV (right). The time lines of the 
experiments are shown on top of the histograms. The percentages of GFP* cells 
measured by flow cytometry 3 days after treatment and NHEJ measured by 
Cell assay 10 days after treatment are shown on bottom left. Bottom right: 
the percentage of GFP™ cells is expressed as fold to the percentage achieved in 
the same experiment with the best strategy on the left. The highest frequency of 
GFP* cells was obtained by combining IDLV-based donor template delivery 
24h before ZFN mRNA electroporation. Sequential exposure to the two 
delivery platforms avoids competition for cell entry and minimizes mutual 
interference, probably due to activation of innate responses to exogenous 
nucleic acids or the timing of peak ZFN expression relative to IDLV reverse 
transcription and nuclear import. 
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Extended Data Figure 2 | Impact on cell viability and specificity of 
integration in CD34* cells treated for targeted integration. a, Percentage of 
GFP* cells measured in liquid culture 3 days after treatment as in Fig. 1b for 
targeted integration into AAVS1 or IL2RG and for the corresponding GFP* 
colonies counted in CFC assays 2 weeks after plating. Means ~ s.e.m. (1 = 7 
cord blood donors). ns, not significant (unpaired t-test). b, Representative 
bright-field and fluorescence microscopy images of GFP” erythroid and 
myeloid colonies. Scale bar, 0.5 mm. c-e, The impact of the gene-targeting 
procedure on the viability, proliferation and clonogenic output of the CD34* 
cells was analysed. c, Representative growth curves of CD34" cells treated for 
targeted integration or transduced with IDLV only or untreated (UT). 

d, Apoptosis analysis performed 24h after treatment on cells in liquid culture 
and e, number of CFCs plated one day after the indicated treatments. 
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Means + s.e.m.; n = 2 or 3, respectively. Overall, there was a transient 
reduction in viable cell number 24 h after electroporation, also observed in CFC 
yield, which resulted from the combined exposure to electroporation, ZFN 
mRNA and IDLV. However, the surviving cells grew with similar kinetics as the 
untreated controls in liquid culture and gave rise to similar proportions of 
myeloid and erythroid colonies. f, g, Targeting specificity of integration. 

f, Southern blot (top) and PCR (bottom) analyses for targeted integration into 
IL2RG on iPSCs obtained by reprogramming GFP” cells from Fig. 1c. UT, 
untreated cells. g, Genomic DNA from GFP” colonies was analysed by PCR for 
targeted integration into AAVS1 or IL2RG. The gels show the PCR amplicons 
for either the 5’ or 3’ HDR integration junction at the genomic target site and a 
control locus (CCR5). The percentages of colonies positive for targeted 
integration by PCR are reported in Fig. 1d. 
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Extended Data Figure 3 | Investigating lower gene-targeting efficiency in 
the more primitive cells. a, Gating strategy used to identify subpopulations of 
cord blood cells according to expression of CD90, CD133 and CD34 surface 
markers. b, After 24h of pre-stimulation, CD34" cells were electroporated with 
175 wg of GFP mRNA (the selected dose of aZFN mRNA from Extended Data 
Fig. 1). Flow cytometry analysis was performed 2 days later using the gating 
strategy shown in Fig. 3a. Bars represent the percentage of GFP” cells (plotted 
on left axis) while the line shows the level of transgene expression (plotted on 
the right axis as MFI, measured in arbitrary units). Means + s.e.m. (n = 16 on6 
cord blood donors). ¢, CD34* cells treated for targeted integration were 
sorted by FACS one day after electroporation according to the gating strategy 
shown in Fig. 3a. The sorted populations were sampled at the indicated times 
and levels of NHEJ at the ZFN target site (AAVS1) were determined by Cell 
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assay (n = 3). d, Apoptosis analysis performed one day after electroporation on 
CD34* cells transduced with IDLV and electroporated with ZFN mRNAs. 
Percentages of live (7AAD , AnnexinV_), early apoptotic (7AAD , 
AnnexinV"), late apoptotic (7AAD*, AnnexinV" ) and necrotic (7AAD*, 
AnnexinV_) cells. Means + s.e.m. (1 = 5 on 4 cord blood donors). 

e, Representative growth curves of CD34” cells cultured in the presence or not 
of SR1 and treated (TI) or not (UT) with the protocol described in Fig. 1b. Note 
that the TI and UT growth curves are reproduced from Extended Data Fig. 1c. 
Addition of SR1 to the culture did not change the proliferation rate of the cells. 
Because SR1 did not increase the total number of cells but the percentage of 
more primitive cells in culture (Fig. 3c), the absolute number of primitive cells is 
larger in cultures containing SR1. 
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Extended Data Figure 4 | Long-term multilineage engraftment of gene- 
targeted CD34* cells in primary and secondary NSG mice. a, Percentages of 
human cells in the indicated organs of NSG mice 15-23 weeks after 
transplantation with CD34™ cells treated with the improved protocols from 
Fig. 3a with or without SRI. b, Percentage of the indicated lineages within 
human cells in peripheral blood of mice 14 weeks post-transplant. 

Means + s.e.m. (48h, n = 8; 48h SR1, n = 11; 48h PGE2, n = 3; 48h PGE2 
SRI, n = 6). Overall, the addition of SR1 and PGE2 to the in vitro culture did 
not significantly affect the in vivo differentiation of treated cells. Notably, the 
increased human engraftment achieved with the optimized culture conditions 
(as illustrated in Fig. 3g) correlates with increased T-cell output. c, Multi- 
lineage GFP marking in individual NSG mice transplanted with CD34* cells 
treated with the indicated protocols for targeted integration. Percentages of 
GFP* cells were calculated within the CD45* Lin” populations (represented 
with different data-point shapes) in different organs (represented by different 
data-point colours). The analysis was performed on peripheral blood 14 weeks 
after transplantation and on spleen and bone marrow at the end of the 
experiments. Only mice displaying =0.1% GFP* cells are represented in the 
graph (n = 2 independent experiments). Note that when using the improved 
protocols for targeted integration GFP” cells are found in multiple lineages in 
all mice. d, Analysis of the primitive human compartment in the bone marrow 
of transplanted mice from c. Top: gating strategy used to define progenitors 
(CD34 CD38"), multilymphoid progenitors (MLPs; CD34* CD38~ 
CD90I°"~ CD45RA*), multipotent progenitors (MPPs; CD34" CD38 
CD90~ CD45RA_ ) and HSCs (CD34* CD38” CD90* CD45RA_ ). Bottom: 
percentages of GFP” cells within the defined populations. Means + s.e.m. (48 h 
SRI, n = 4; 48h PGE2, n = 3; 48h PGE2 SR1, n = 5). e, DNA from total bone 
marrow cells of transplanted mice was analysed by PCR to determine TI into 
IL2RG. Each column represents one mouse. The schematics show the different 
sets of primers used to detect on-target insertions mediated by HDR or NHEJ 


(the latter shown with the vector in sense or reverse orientation with respect to 
IL2RG). Whereas evidence of HDR-mediated insertion of the cassette was 
retrieved from all mice assayed, there was also indication of some NHEJ- 
mediated integration of the donor IDLV. Trapping of IDLV at sites of NHEJ 
has been previously reported’®. We note that our strategy could be adjusted to 
also exploit this type of insertion to drive transgene expression” and potentially 
increase the overall efficiency of gene correction (see schematic in Fig. 4a). In 
such case, one should target insertion within an intron of the gene so that the 
splice acceptor site of the corrective cDNA is next in line for processing with the 
splice donor site of the upstream endogenous exon and any intervening 
sequence can be spliced out from the chimaeric transcript leading to 
reconstitution of a functional open reading frame (provided that the insertion 
occurred in the same orientation as gene transcription). An additional benefit 
of targeting an intronic sequence would be to spare exons from disruption by 
NHE), although this is of minor concern when dealing with already defective 
alleles. f, 15-23 weeks after the primary transplant, human CD34" cells were 
purified from the bone marrow of 11 mice from c (mice were chosen among 
those best engrafted with GFP* cells in the different CD34" cell treatment 
groups) and transplanted (one mouse to one mouse) into 7-11-week-old NSG 
mice. Secondary recipient mice were monitored for engraftment of human 
CD45* and GEP* cells at 8 and 12 weeks post-transplant in peripheral blood, 
and on bone marrow at the end of the experiments. Top left: percentages of 
human cells in the indicated organs. Top right: percentages of the indicated 
lineages within the human cells. B cells were defined by expression of CD19 and 
myeloid cells were defined by expression of CD13 in peripheral blood or CD33 
in bone marrow. Dots represent individual mice. Bottom left: percentages of 
GFP* cells within the human graft in the indicated organs and (bottom right) 
within the indicated lineages, as on top. g, Human cells from the bone marrow 
of the top 2 engrafted secondary recipient mice from f were analysed by PCR for 
targeted integration into IL2RG. 
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Extended Data Figure 5 | Expansion of lymphoid cells in the transplanted 
mice after tumour challenge. a, Left: percentage of human cells in the 
peripheral blood of mice transplanted with male CD34* cells treated as 
indicated for targeted integration into IL2RG. Analyses performed at the time 
of tumour injection (top panel) and 3 weeks later (bottom panel). Right: 
percentages of T and NK cells (CD3* and CD16/56" cells, respectively) 
measured within the human CD45™ cells in peripheral blood. b, Fold change in 
the absolute number of the indicated lineages in peripheral blood 3 weeks after 


15 20 
Time after tumor injection (days) 


tumour challenge. Means + s.e.m. (24h SRI, n = 5; 48h SRI, n = 6; 48h, 
n=5).c, Counts of GEP* and GFP” T and NK cells in the peripheral blood of 
transplanted NSG mice before (top) and 3 weeks after (bottom) injection of the 
MDA-MB 231 tumour cell line engineered to express human IL-7, IL-15 and 
GM-CSF. Fold changes calculated from these values are plotted in Fig. 4e. 

d, Tumour growth in mice transplanted (n = 16) or not (n = 3) with treated 
CD34* cells. ****P < 0.0001 (two-way ANOVA). 
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Extended Data Figure 6 | Generation of T cells with substantial VB TCR 
diversity from the engrafted gene-targeted HSPCs. a, Analysis of TCR VB 
repertoire performed on PBMCs from a healthy donor. The histogram shows 
the frequency distribution of the different complementarity-determining 
region 3 (CDR3) lengths identified within the indicated VB families. As 
expected from a highly polyclonal TCR repertoire, all VB families display a 
Gaussian distribution of the CDR3 lengths. b, Frequency distribution of the 
different CDR3 regions measured as in a for the GFP” (left) and GFP” (right) T 
cells harvested from the transplanted C1, C5 and A4 mice from Extended Data 
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Fig. 5c. c, Complexity score*’ assigned to each Vf family for the samples shown 
in b. The sum of the scores for all the family is plotted in Fig. 4g. Note that all the 
samples analysed display similar TCR VB repertoire distributions, constrained 
for some families and more polyclonal for others, as might be expected for 
human T cells developed in haematochimaeric mice“, and that no significant 
differences are observed between the GFP* and GFP cells. This finding 
indicates that the rate of gene targeting achieved in the transplanted stem/ 
progenitor cells does not detectably limit the generation of a polyclonal and 
functional T-cell repertoire in vivo. 
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Extended Data Figure 7 | Functional and phenotypic characterization of 
IL2RG-edited T cells harvested from transplanted NSG mice. a, Graph 
showing the viability of GEP* and GFP” T cells harvested from the 
transplanted NSG mice (from Extended Data Fig. 6) or of T cells from 
peripheral blood of healthy donor (HD T cells), cultured in the presence 

(+ cyto) or absence (no cyto) of human IL-7 and IL-15. b, Left: representative 
density plots of GEP* and GFP T cells harvested from mice, stained for CD8 
(left) and CD4 (right). Right: GEP* and GFP T cells harvested from mice were 
activated ex vivo with beads coated with anti-CD3 and anti-CD28-specific 
antibodies, and cultured with IL-7 and IL-15. CD4 and CD8 composition of 
GFP* and GFP cells, measured during ex vivo culture, is shown (n = 3). 

c, Surface phenotype of CD4 and CD8 T cells from b and HD T cells at day 19 


IL-2 + cells (%) 


after stimulation. A representative plot (left) and histograms with medians + 
s.e.m. (right) are shown. T stem memory cells (TSCM) are defined as CD62L* 
CD45RA* (refs 42, 43), T central memory (TCM) as CD62L* CD45RA_, T 
effector memory (TEM) as CD62L” CD45RA and terminal effectors 
(TEMRA) as CD62L” CD45RA”. d, Production of IL-2 and IFN-y by GFP or 
GFP T cells as in b, and by HD T cells after 6h stimulation with 
PMA+ionomycin. A representative plot (left) and percentages of IL-2 and 
IFN-y*~ cells are shown. P = ns (unpaired t-test). e, IFN-y Elispot assay 
showing the frequencies of IFN-y-producing cells from b challenged with the 
MDA-MB 231 tumour cell line at different effector to target ratios. PHA 
stimulation was used as positive control. 
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Extended Data Figure 8 | Functionality of y-chain-dependent signalling 
pathway in IL2RG-gene-edited T cells. GFP* or GFP T cells from the 
transplanted mice (as in Extended Data Figs 6 and 7) or T cells from the 
peripheral blood of healthy donor (HD T-cells) were exposed to the indicated 
doses of y-chain-related cytokines. The phosphorylation levels of STATS on 
Y694 (pSTATS), STAT3 on Y705 (pSTAT3) and AKT on S473 (pAKT) were 
measured at the indicated time points by flow cytometry analyses. a, Top: 
representative plots showing pSTAT5, pSTAT3 and pAKT. Each time point of 
analysis is labelled by a different concentration of the intracellular dye PBSE. 


Bottom: fold changes in the levels of pSTAT5, pSTAT3 and pAKT relative to 
the time 0 of stimulation with the indicated maximal doses of IL-15 and IL-2 
(n = 3). Phosphorylation of STAT3 was used as a specificity control. P = ns 
(two-way ANOVA). b, Table showing the fold changes in the levels of pSTATS 
which are graphically represented in Fig. 4k. Statistics in the first column 
indicate significant change during time. ****P < 0.0001, ***P < 0.001 (two- 
way ANOVA). c, Heat map representing fold changes in pAKT and pSTAT3 
levels after the indicated time of exposure (min) to increasing amounts of IL-2 
or IL-15. P= ns between GEP* and GFP’ cells (two-way ANOVA). 
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0.4 0.0079 ACCTTCCTCGCCGC-TICCTCAGCTCCECGGAAGGAGI 0.29571 GTGTTCGcaAgccGcT: - -TGGAAGTGCTCA 
0.0014 ACCTICCTCG-CGCTITCCTCAGCTCCGCGGAAGGAGI 0.0726  eTeTrc- -CCCACTCTGTGGAAGTGCTCA 
0.8 0.0014 ACCTTCCTCGCC-CTTTCCTCAGCTCCGCGGAAGGAGI 0.0261; GTGTTCGGaGccGcTIT-. =-GAAGTGCTCA 
0.2 0.0014 ACCTTCCTC-- -- sone 0.0005: GTGTTCecascce- ~TGTGGAAGTGCTCA 
g.o007 ACCTTCCTCGCCGCTTTCC-CAGCTCCGCGGAAGGAGI 0.0606! -CCCACTCTGTGGAAGTGCTCA 
on 0.0007 ACCTTCCTCGCCGCTITCCTC-GCTCCGCGGAAGGAG: 0.0470 GTGTTCGGAGCCGCTT------ ---GAAGTGCTCA 
0.0 0.0007 ACCTTCCTCGCCGCTITCCTCAGCT-CGCGGAAGGAGI 9.0010: GTGTTCGGAGCCGCTIT----- ~---AAGTGCTCA 
20 60 100 140 180 220 260 300 340 0.0007 ACCTTCCTCGCCGCTITCCTCAGCTCC -CGGAAGGAGI O.2845!). GTR a= "ACCCACTCTGTGGAAGTGCTCA 
Bases 0.0007 ACCTTCCTCGCCGCTITCCTCAGCTCCGC -GAAGGAGI t 7 ret Grerreqensccaort= Peyetieting 
0.0007 ACCTTCCTCGCCGCTTTCCTCAGCTC--CGGAAGGAGI 0.0834!“ Grameccndcnaey Se eencreiee 
0.0007 ACCTTCCTCGCCGCTTTCCTCAGCTCCGC--AAGGAGI rat face 
. : pace ~TAACCCACTCTGTGGAAGTGCTCA 
eer +2356; ‘AACCCACTCTGTGGAAGTGCTCA 
SLC31A1, sample A, insertions 0.0433: gTaTTcccacccac ---GAAGTGCTCA 
0.0123 ACCTTCCTCGCCGCTITT---CCTCAGCTCCGCGGAAG o.o00s:  - ‘TITAACCCACTCTGTGGAAGIGCTCA 
0.0195 ACCTTICCTCGCCGCTITCCTCCCTCAGCTCCECGGAAG 0.0005). ar: “ACCCACTCTGTGGAAGTGCTCA 
A : tesa GTIGTIC@GAcccecTIT----- 
SLC31A1, sample D, deletions 0.0287 
0.0246 ACCT-------- 0.0172 -ACTCIGIGGAAGTGCTCA 
0.0053 ACCTICCTCGCCGC-TICCTCAGCTCCGCGGAAGGAGA, 0.0031 ---GAAGTGCTCA 
0.0017 ACCTICCTCGCCGCTTT-CTCAGCTCCGCGGAAGGAGA 0.5496 ~CCCACTCTGTGGAAGTGCTCA 
9.0008 ACCTICCTCG-CGCTTTCCTCAGCTCCGCGGAAGGAGA 0.0005: CCACTCTGTGGAAGTGCTCA 
9.0008 ACCTICCTCGCCGCTTTCC-CAGCTCCGCGGAAGGAGA 0.35521 eTeTTc@sasccec ----TGCTCA 
0.0008 ACCTICCTCGCCGCTTTCCTCAGCTCCG-GGAAGGAGA 0.0005. GTeTTCG GTGGAAGTGCTCA 
0.0008 ACCTICCTCGCCGCTTTCCTCAGCTCCGC-GAAGGAGA He SA90) vee S5 CCACTCTGTGGAAGTGCTCA 
0.0004 ACCTICCTCGCC-CTTICCTCAGCTCCGCGGAAGGAGA, 0.0386! GIGTICGGAGCC--~~------- == --- GTGCICA 
a.0004 ACCTICCTCGCCGCTTICCTCAGCTCC-CGGAAGGAGA ooeae 1 ~CCCACTCTSTGGAAGTGCTCA 
a.0004 ACCTTCCTCGCCGCTIT--TCAGCTCCGCGGAAGGAGA o.0604) CCACTCTGTGGAAGTGCTCA 
0.0004 ACCTICCT- --CGCTITCCTCAGCTCCGCGGAAGGAGA anaes ~GGAAGTGCTCA 
aan ACCTICCTCG- ----GCTCCGCGGAAGGAGA toes anreaee 
: CTTICCTCAGCTCCGCGGAAGGAGA a-3703 eecaakeerastin 
SLC31A1, sample D, insertions nlaiza OO Terrence 
0.0004 © ACCTICCTCGCCGCTTTCCTCAGCTCCTGCGGAAGGAGA 0.1400 eTeT. 
0.0004 © ACCTTCcTeseeeeTTTcceTCaGCTCc-GCGGAAGGAGA 0.1243. GTGTTCG- 
0.0579' @TGTTCGGA- 
g.000s;  - -CTCTGTGGAAGTGCTCA 
0.2356, - ~TGTGGAAGTGCTCA 
o.0088:  - GTGGAAGTGCTCA 
0.1844, CCACTCTGTGGAAGTGCTCA 
0.0449 - '--TCTGTGGAAGTGCTCA 
o.ao0s;  - -ACTCIGIGGAAGTGCTCA 
G.o14li CTGIGGAAGTGCTCA 
0.0245! GIGGAAGTGCTCA 
0.20631 - =--GTGCTCA 
0.06531 - CTGTGGAAGTGCTCA 
0.0135! 
0.0005; @TeT. TCA 
0.0924 AACCCACTCTGIGGAAGTGCTCA 
g.0731 - ~-GTGGAAGTGCTCA 
O.0161 ~CTCTGTGGAAGTGCTCA 
o.ao0s;  - 
0.13421 
o.oo41 -ACTCIGIGGAAGTGCTCA 
G.0412) 0 - CTGTGGAAGTGCTCA 
0.01771 
0.0015!  gTGTTCGGASCCGCTTTA. 
0.0329 
0.0057) - 
0.0219 gToTTC- 
g.0041 -TGTGGAAGIGCTCA 
o.oo0s ~AAGTGCTCA 
g.0417 0 GTGGAAGTGCTCA 
0.0010 TCTGTGGAAGTGCTCA 
0.0287 —_ 
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Extended Data Figure 9 | Indel quantification on the intended ‘on’ target 
and candidate ‘off target genomic sites of IL2RG ZENs. a, Indel 
quantification expressed as percentage of the total number of reads for each 
sample and target. The untreated (UT) sample F is shown in the rightmost 
column. 0 values underlie undetected events due to low sequencing read 
coverage (see Supplementary Information). b, Table of P-values obtained 
comparing the number of indels for each target and its untreated control 
sample (Fisher’s exact test for contingency data). P-values allowed 
distinguishing real accumulation of indels (labelled with a green arrow close to 
the P-value) from background noise (marked with a yellow horizontal bar). 
Only SCARBI and SLC31A1 show indels in samples A, B and D at a frequency 
significantly higher than the untreated sample. c, Indel frequency distribution 
along the amplified genomic sequences of the positive control IL2RG and the 


two loci that showed a low but significant off target activity, SCARBI and 
SLC31A1. The x axis represents the amplified sequence in base pairs while the y 
axis shows for each base the percentage of reads that reported indels for each 
sample after noise subtraction (see Supplementary Information). Note that 
indels mainly occur in the central region of the amplicon, corresponding to the 
spacer between the genomic sequences expected to be bound by the ZFNs. 

d, Representative sequence alignments of retrieved indels in IL2RG (sample A), 
SCARB1 (from samples A and B) and SLC31A1 (from samples A and D) 
focusing the analysis on the region bearing identity or homology to the 
intended ZEN target sequence. For each sequence type the relative frequency of 
retrieval in the sample is reported as a percentage. As shown in c indels mostly 
occur in the central spacer region between the 2 ZEN binding sites (underlined). 
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Percentage Absolute Normal values Flow cytometric BM analysis (within lymphocytes gate) 
(%) count (x109/!) (2-5 months) Markers Percentage 
White blood cells (WBC) 6.8 B-cells cb22 + 38.5 
Lymphocytes T-cells CD3 * <0.01 
(within PWS gate) 10.5 0.7 3.7 -9.6 _ 
engine NK -cells CD16.CD56 */CD3 <0.01 
cells * = 
(Igk/IgA ratio: 1.4) she OF 06-30 : CD13 + and/or 
Myeloid cells CD33 + 2.8 
NK cells 1.5* 0.01 0.1-1.3 
CD3 *T cells 0.3* <0.01 2.3 -6.5 , CD71 */CD45 ~ 36.7 
CD4 *T cells 0.2" <0.01 1.5-5.0 Enel calle cp71 7cD45 - 1.0 
CD8 *T cells 0.02* <0.01 0.5-1.6 
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Extended Data Figure 10 | IL2RG gene correction in CD34 cells from the 
bone marrow of a genotyped subject with SCID-X1. a, Blood cell counts in 
the peripheral blood and bone marrow of a SCID-X1 male child carrying the 
R289X mutation in the IL2RG gene (see Supplementary Information) and 
showing virtual absence of T and NK cells. Asterisks in the left-most table 
indicate values calculated within the leukocyte gate. b, Left: representative 
density plots showing the cellular composition of a bone marrow harvest froma 
healthy donor (top) and the subject with SCID-X1 (bottom) after purification 
of nucleated cells (bone-marrow-derived mononuclear cells, BMMCs). 
Myeloid cells are stained for CD15 and CD33, B cells for CD19, T cells for CD3 
and NK cells for CD16. Right: y-chain expression within the indicated cell 
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populations gated from the plots shown on the left. As expected from the 
missense R289X mutation, the y-chain protein is expressed on the cell surface 
but it is not functional, as indicated by the absence of T and NK cells in the 
patient. In Fig. 5 we show normal expression of the y-chain protein in the 
myeloid cell progeny of three gene-corrected CFCs from the patient cells (gene 
correction was proven by evidence of targeted integration in the only IL2RG 
allele of this male individual in the clonal CFC progeny and by the expression of 
the fusion transcript bearing the corrective cDNA). These data indicate normal 
y-chain expression from the reconstituted allele in the edited patient cells. 

c, Bright-field and fluorescence microscopy images of the three GFP* myeloid 
colonies obtained after IL2RG gene correction of SCID-X1 cells. 
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Homologue engagement controls meiotic 
DNA break number and distribution 


Drew Thacker!?, Neeman Mohibullah?, Xuan Zhu!? & Scott Keeney)? 


Meiotic recombination promotes genetic diversification as well as pairing and segregation of homologous chromosomes, 
but the double-strand breaks (DSBs) that initiate recombination are dangerous lesions that can cause mutation or meiotic 
failure. How cells control DSBs to balance between beneficial and deleterious outcomes is not well understood. Here we 
test the hypothesis that DSB control involves a network of intersecting negative regulatory circuits. Using multiple 
complementary methods, we show that DSBs form in greater numbers in Saccharomyces cerevisiae cells lacking ZMM 
proteins, a suite of recombination-promoting factors traditionally regarded as acting strictly downstream of DSB 
formation. ZMM-dependent DSB control is genetically distinct from a pathway tying break formation to meiotic 
progression through the Ndt80 transcription factor. These counterintuitive findings suggest that homologous chromosomes 
that have successfully engaged one another stop making breaks. Genome-wide DSB maps uncover distinct responses by 
different subchromosomal domains to the ZMM mutation zip3 (also known as cst9), and show that Zip3 is required for the 
previously unexplained tendency of DSB density to vary with chromosome size. Thus, feedback tied to ZMM function 
contributes in unexpected ways to spatial patterning of recombination. 


DSBs are hazardous genomic damage that most cells avoid but that 
each meiotic cell introduces in large numbers; therefore the activity of 
Spol1, the protein that makes DSBs, must be tightly regulated’. Typical 
depictions of recombination pathway(s) (Fig. 1a) implicitly divide involved 
proteins into upstream (DSB formation) and downstream (DSB repair) 
factors. This view suggests that eliminating downstream factors will 
have little or no effect on number or distribution of upstream events (DSBs). 
An alternative view considers recombination genome-wide, not just at 
any one site: DSBs do not form all at once, so fates of early DSBs might 
govern whether and where later DSBs form. In this scenario, downstream 
factors may behave genetically as upstream factors if their absence dis- 
rupts feedback circuits. 

Precedents for feedback are known in several organisms. In mice, 
yeast and flies, ATM kinase governs a negative feedback loop inhibiting 
DSB formation in response to breaks” *. In mice, flies and worms, defec- 
tive interhomologue interactions are known or proposed to allow con- 
tinued DSB formation, suggesting another type of feedback® *. The logic 
of these circuits predicts different behaviour in DSB repair mutants: the 
ATM-type circuit should suppress further DSB formation if existing 
breaks cannot be repaired, but, conversely, defective interhomologue 
interactions caused by repair defects might instead allow more DSBs to 
accumulate. We test these predictions here and ask whether feedback 
contributes to the spatial organization of recombination. We focus on 
the ZMM proteins in Saccharomyces cerevisiae (Zip1-4 (Zip4 also 
known as Spo22), Msh4-5, Mer3, Spol6 and Pph3). These biochemi- 
cally diverse factors shepherd recombination intermediates towards a 
crossover fate and help build synaptonemal complexes, therefore ZMM 
null mutations cause recombination and synaptonemal complex defects, 
with varying degrees of meiotic arrest’. 


Increased DSB numbers in ZMM mutants 

We measured DSBs by Southern blotting whole chromosomes separated 
on pulsed-field gels (Fig. 1b, c). In wild type, chromosome fragments 
appeared and disappeared as DSBs were formed and repaired, as expected. 


By contrast, broken chromosomes accumulated in zip3 mutants for 
at least 2h after DSBs waned in wild type, reaching a plateau 1.7-fold 
higher than the wild-type peak and persisting for hours (Fig. 1c). (This 
underestimates DSBs because zip3 mutants complete some repair (below).) 
In zip1 mutants, broken chromosomes reached higher levels than wild 
type before disappearing, and in msh5 mutants, time-averaged DSB 
levels were higher than in wild type but the maximum was only slightly 
increased (Fig. 1c and Extended Data Fig. 1). Differences in arrest may 
account for variation between mutants at later times (Supplementary 
Discussion). In principle, increased steady-state DSBs could reflect extended 
lifespan, increased frequency, or both. As these measurements cannot 
distinguish between these possibilities, we applied a battery of methods 
that offset limitations of any one approach. 

To assess DSBs globally and mitigate uncertainty from repair defects, 
we examined Spo11-oligonucleotide (oligo) complexes, by-products of 
DSB formation that can be used to measure DSB number and distri- 
bution”’®" (Fig. 1a). Extracts were prepared from cultures expressing 
phenotypically normal Flag-tagged Spo11 (Extended Data Fig. 2). Anti- 
Flag immunoprecipitates were labelled with terminal transferase and 
[°*P]dCTP, resolved by SDS-polyacrylamide gel electrophoresis (SDS- 
PAGE), then labelled Spol1-oligo complexes were detected by phos- 
phorimager and total Spol 1 was detected by western blotting (Fig. 1d). 
In wild type, Spol1-oligo complexes appeared contemporaneously with 
DSBs, peaked at ~4 h, then declined (Fig. 1d, e). In zip3 mutants, Spo11- 
oligo complexes first appeared with similar timing and levels as wild 
type, but continued to accumulate after 4h (Fig. 1d, e). Spoll-oligo 
levels reached a maximum at ~5h that was 1.8-fold higher than the 
wild-type peak and remained high after most complexes had disap- 
peared in wild type (altered accumulation of free Spol1 protein is pro- 
bably due to arrest (Supplementary Discussion)). Similar patterns were 
seen in msh5 and zip1 mutants (Fig. le and Extended Data Fig. 3). If 
turnover of DSBs and Spol1-oligo complexes is separable, these find- 
ings imply that ZMM mutants make more breaks. 


Molecular Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA. 2Weill Graduate School of Medical Sciences of Cornell University, New York, New York 10065, USA. 
3Howard Hughes Medical Institute, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA. 
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Figure 1 | More DSBs form in ZMM mutants. a, Spol1 generates a covalent 
protein-linked DSB; endonucleolytic cleavage releases Spol1 bound to a short 
oligo (detection method on left). Resection is followed by strand invasion and 
ZMM-dependent stabilization of intermediates fated to become crossovers. 
b, c, Representative pulsed-field gel Southern blots probed for chromosome 
IX are shown in b and Poisson-corrected DSB quantification shown in 
c (mean + s.d., 3 cultures). P, parental; W, wells. d, e, Representative Spol1- 
oligo complex time courses are in d and quantification in e (mean + s.d. for 3 
cultures, except at 10h for msh5 and zip3 analyses (1 culture)). Radiolabelled 
Spol1-oligo complexes were detected by autoradiography (top) and total 
Spol11 was detected by anti-Flag western blot (WB, middle). The main labelled 
species differ in oligo size'®. Nearly all of the western blot signal is Spol1 that 
has not made a DSB". Asterisk indicates species co-migrating with upper 
Spol1-oligo complexes; arrowhead represents proteolytic product. Bottom, 
extract samples run separately and stained with Coomassie control for input to 
the immunoprecipitates. In panels c and e, mutants are plotted with wild-type 
data collected in parallel. 


If ZMM mutants make more breaks, then more repair products 
should also accumulate. To gauge interhomologue recombination, we 
used strains heterozygous for different arg4 mutations (Fig. 2a). Pro- 
phase cells transferred to rich medium abort meiosis, often complet- 
ing recombination even if unable while still in meiosis’*. All ZMM 
mutants tested except msh5 formed more Arg” prototrophs than wild 
type (Fig. 2b). Increased recombination has been reported in all ZMM 
mutants examined (including msh4 and msh5), but was not inter- 
preted as evidence for increased DSB frequency'*”” (Supplementary 
Table 1). Thus, a context-dependent hyper-rec phenotype is a com- 
mon but previously unrecognized property of ZMM mutants. 

We explored this hyper-rec behaviour by quantifying recombinants 
at three natural DSB hotspots (Fig. 2c and Extended Data Fig. 4a, b). At 
each, allelic copies have different flanking and central restriction sites. 
Crossovers and parental length fragments are resolved by electrophor- 
esis after digestion with flanking enzymes, then DNA is digested in the 
gel with the central enzyme before electrophoresis in the orthogonal 
dimension. Noncrossover gene conversion molecules co-migrate with 
one parent in the first dimension but have a central restriction site match- 
ing the other parent (Fig. 2d and Extended Data Fig. 4c-e). Key features 
are that the hotspots are high-intensity with few/no other DSB sites nearby, 
and central polymorphisms are positioned to make incorporation into 
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heteroduplex DNA likely. At CCT6 and ERG1, recombinant mole- 
cules were 1.7-2.5-fold more abundant in the ZMM mutants tested, 
with increased noncrossovers and crossovers at or below wild-type 
levels (Fig. 2e). At GAT1, zip3 mutants displayed fewer crossovers offset 
by more noncrossovers, for a net frequency comparable to wild type 
(Fig. 2e). (Refer to Supplementary Discussion of gene conversion tracts 
and sister chromatid recombination.) These findings reinforce the con- 
clusion that ZMM mutations cause a hyper-rec phenotype that is vari- 
able between loci. 

DSBs were present at CCT6 and GAT] in zip3 mutants at late times, 
well past the point when most DSBs had disappeared in wild type 
(Fig. 2f, g and Extended Data Fig. 5a). Recombination intermediates 
(joint molecules’) that are transient in wild type were also detected 
late in zip3. DSBs and joint molecules at CCT6 were also present in 
zip1 and msh5 mutants later than in wild type but not as long as in 
zip3, similar to analysis of breaks on chromosome IX (Extended Data 
Fig. 5b-e). These results agree with data at artificial hotspots in ZMM 
mutants (for example, ref. 20), but it was not previously possible to evalu- 
ate whether DSB numbers were increased and most previous studies 
dismissed or did not consider this possibility (Supplementary Table 1). 
We can now combine DSB data with quantification of recombination 
intermediates and products (Supplementary Table 2): this bookkeep- 
ing reveals that msh5, zip1 and zip3 mutants experience 1.8-2.6-fold 
more detectable DSB-related events at CCT6. Recombination product 
overabundance yields the same conclusion for zip3 and msh5 at ERG1 
(1.7-1.9-fold), whereas wild type and zip3 mutants had similar totals 
at GAT1. We conclude that ZMM mutants incur more DSBs, but to 
varying degrees at different loci. 


Separate pathways controlling DSB number 


Recombination products and DSBs accumulate in cells lacking Ndt80, 
a transcription factor controlling pachytene exit’*’, and therefore it 
has been suggested that this stage in prophase ends a period permissive 
for DSB formation’”’, further supported by recent studies***”’. Indeed, 
Spol 1-oligo complexes reached 1.2-1.4-fold higher than the wild-type 
maximum and remained high through late time points in ndt80 mutants 
(Fig. 3a, b and Extended Data Fig. 6). Heteroallele recombination was 
also increased (1.6-fold, Fig. 2b). Pachytene delay/arrest via Ndt80 inhi- 
bition is a hallmark of ZMM mutants’, suggesting that increased DSBs 
might be an indirect consequence of arrest”, perhaps analogous to 
increased DSB numbers when CHK-2 kinase activity is prolonged in 
Caenorhabditis elegans****. If so, then ZMM mutations should cause 
no change if Ndt80 is absent. However, more Spol 1-oligo complexes 
(Fig. 3a, b and Extended Data Fig. 6) and heteroallele recombinants 
(Fig. 2b) accrued in zip3 ndt80 and msh5 ndt80 double mutants than in 
ndt80 single mutants. Furthermore, msh5, zip1 and zip3 had similarly 
increased Spol1-oligo complexes (Fig. le) despite different arrest phe- 
notypes (Fig. 3c). Thus, although the ZMM mutant DSB phenotype is 
probably influenced by the combined effects of Ndt80 inhibition and a 
hyperactivated DNA damage response, meiotic arrest per se does not 
explain ZMM mutant-provoked DSB increase. 

Instead, we infer that a ZMM-dependent process(es) is more directly 
responsible for inhibiting DSB formation. A plausible mechanism is that 
chromosomes that have engaged their homologues undergo structural 
changes that render them unfit Spol1 substrates*”’. ‘Homologue engage- 
ment’ could mean synaptonemal complex formation and/or progression 
of crossover-designated recombination intermediates, both promoted 
by ZMMs. Supporting this model, DSB-promoting factors Hop1 and 
Red1 accumulate on chromosomes in ZMM mutants”, proteins required 
for DSBs are displaced from pachytene chromosomes in wild-type yeast 
(for example, ref. 29), and Hop1 orthologues are displaced after syn- 
apsis in yeast and mouse*””®. 

Hyper-rec behaviour in ZMM mutants is reconciled with tetrad data 
demonstrating globally reduced crossing over (for example, ref. 30) by 
noting that there is a reduced per-DSB likelihood of generating a cross- 
over that offsets increased DSBs (Supplementary Discussion). Our findings 
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Figure 3 | Separable effects of ndt80 and ZMM mutations. a, b, Spol1-oligo 
complex labelling from representative time courses is in a and quantification 
from =3 cultures (mean = s.d.) is in b. c, Meiotic progression (percentage of 
cells completing the first division). 


separation-of-function (rad50S) or dmc1 backgrounds’*”**"~* (Extended 
Data Fig. 7a, b). Dmcl is an essential strand exchange protein and 
rad50S mutants cannot remove Spol1 from DSB ends. As these muta- 
tions block recombination before ZMM proteins act, they are unin- 
formative for querying ZMM mutant effects. This caveat may also apply 
to recombination-defective mutants in other organisms. 


Shaping the DSB landscape 


If ZMM-dependent feedback works via chromosome structure changes 
linked to homologue engagement, then it should be spatially patterned. 
We tested this by deep-sequencing Spol1 oligos to map DSBs (Fig. 4a 
and Supplementary Table 3). Control cultures with a fully functional 
Spoll-protein A fusion agreed with each other and previous results"! 
(Extended Data Figs 2, 8 and data not shown). The DSB ‘landscape’ is 
shaped by combinatorial action of many factors that operate hierarch- 
ically"’**. At short scales (sub-kilobase (kb)), the landscape is dominated 
by hotspots, mostly in nucleosome-depleted promoters. This pattern 
was unaffected in zip3 mutants, in that DSBs formed in the same hot- 
spots (Fig. 4a, Extended Data Figs 4f, 8c and Supplementary Table 4). 

On larger scales, however, zip3 showed substantial alterations. Smaller 
chromosomes form more crossovers per unit length than larger ones** 
because of variation in DSB levels”’, but what controls DSB differences 
has been unclear. Remarkably, zip3 mutation eliminated the normal 
inverse correlation between Spo11-oligo density and chromosome length 
(Fig. 4b). If the zip3 map is scaled by 1.8-fold (on the basis of peak Spol1- 
oligo levels, Fig. 1e), all chromosomes had more DSBs but larger ones 
went up disproportionately (Fig. 4c). Thus, ZMM-dependent feedback 
is necessary for length-dependent recombination variation in wild type. 
Perhaps the number (not density) of DSBs governs speed or efficiency 
of homologue engagement: if so, smaller chromosomes might tend to 
have more time to accumulate DSBs. A nonexclusive possibility is that 
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Figure 4| Altered DSB distribution in zip3 mutants. a, Top, reproducibility 
of Spol1-oligo maps. Bottom, DSBs form at the same hotspots in zip3 as 
wild type. Smoothed with 201-bp Hann window. b, Zip3 is required for 
chromosome size-dependent variation in Spol1-oligo density. Lines, least 
squares fits (dashed denotes nonsignificant). c, Larger chromosomes 
experience greater increase in Spo11 oligos. Fold change is the per- 
chromosome Spo1 1-oligo density in zip3 over wild type (WT). Open 

circles, chromosome XII (‘12’, omitting rDNA length) and the portions of 
chromosome XII left or right of the rDNA (‘12L’, ‘12R’). Regression line treats 
12Land 12Ras separate chromosomes. d, Regional variation in response to zip3 
mutation. Each point is the change at a hotspot (plotted on log scale). Red lines, 
local regression (loess); green circles, centromeres. e, Local domains of 
correlated behaviour. Each point compares hotspots to their neighbours in 
5-kb-wide windows the indicated distance away. Nearby hotspots show 
correlated behaviour for fold change in zip3 (red), but not heat (Spol1-oligo 


DSB suppression spreads far relative to chromosome length, with longer 
chromosomes providing more spreading room. 

Subchromosomal domains differed in response to zip3 mutation: 
Spol11-oligo frequencies increased less than average in 20-kb zones at 
telomeres and centromeres (where few DSBs form in wild type''***”), 
and were unchanged or reduced near the ribosomal DNA (rDNA), caus- 
ing chromosome 12 to be an outlier in whole-chromosome analysis 
(Fig. 4c, d and Extended Data Fig. 9a). The remaining interstitial regions 
varied widely, with local regression along chromosomes suggesting 
alternating domains of greater or lesser change (Fig. 4d). Supporting 
this conclusion, the change in each hotspot correlated with the change 
in hotspots located nearby, with correlation strength decaying with dis- 
tance (Fig. 4e). 

To better understand these domains, we compared Spo11-oligo maps 
to chromosomal features including the distribution in wild type of Zip3 
protein, chromosome structure proteins needed for normal DSBs (Hop1, 
Red1, Rec8) and proteins essential for Spoll activity (Mei4, Mer2, 
Rec102, Rec104, Rec114), previously defined by chromatin immuno- 
precipitation (ChIP)**”’. The magnitude of change in Spo11-oligo den- 
sity in zip3 correlated with enrichment of Hop1, Rec114, Mei4, Mer2 
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Observed fold change 


Progression through meiosis 


frequency) in wild type (black). Shaded areas denote 95% confidence 
estimates for hotspots randomized within-chromosome (randomized r > 0 
for zip3-fold change because of the chromosome size effect). f, Correlation 
between log-fold change in zip3 and binding of indicated proteins, binned in 
non-overlapping windows of varying size. For clarity, other proteins are in 
Extended Data Fig. 9b. Pericentric, subtelomeric and rDNA-proximal regions 
were censored. Closed symbols, P < 0.05. g, Fit of multiple regression model 
predicting changes (log scale) in Spol 1-oligo density in 35-kb windows from 
ChIP data, G+C content and chromosome size (Supplementary Table 5). 
Dashed lines, observed mean fold change. h, Network of feedback circuits 
controlling DSB formation. Circuit 1: DSBs activate Tell (ATM in mouse), 
which inhibits further DSB formation. Circuit 2: ZMM-dependent interactions 
between homologous chromosomes inhibit Spol 1. Circuit 3: Ndt80 shuts down 
DSB formation and drives pachytene exit; Mec1 kinase delays or blocks 
Ndt80 activation when DSBs are present. 


and Red1, with highest correlation for binning windows =20 kb (Fig. 4f 
and Extended Data Fig. 9b, c, e). The distributions of these proteins 
are themselves correlated** (Extended Data Fig. 9d). We infer that large 
domains (tens of kb wide) enriched for these proteins tend to be more 
responsive to ZMM-dependent feedback. G+C content, Spol1-oligo 
density in wild type, and distributions of Rec8, Rec102 and Rec104 were 
uncorrelated or weakly anti-correlated when considered individually 
(Fig. 4f and Extended Data Fig. 9b, c, e, f). However, we observed a 
strong scale-dependent correlation with the distribution Zip3 displays 
when most DSBs have formed and homologues are engaging” (Fig. 4f 
and Extended Data Fig. 9c, e). Zip3 localizes to subsets of recombina- 
tion sites'’, so the positive correlation between Zip3 accumulation in 
wild type and altered DSB frequency in zip3 suggests that Zip3 inhibits 
DSB formation, directly or indirectly, at sites of homologue engagement. 
Multiple regression indicates that these chromosomal features, plus 
chromosome size, explain ~40% of the variation in zip3-induced DSB 
change (Fig. 4g and Supplementary Table 5). Our findings elucidate 
the locus-to-locus variability of ZMM mutant hyper-rec behaviour 
and reveal that ZMM-dependent feedback shapes the DSB landscape 
in wild type. 
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Conclusions 


We propose that the logic of DSB control involves a drive towards 
DSB formation that is restrained quantitatively, spatially and tempor- 
ally by distinct but intersecting negative influences (Fig. 4h). We note 
several implications. First, Spoll catalytic potential exceeds what is 
realized in any one meiosis. Thus, DSB numbers may underestimate 
severity of biochemical defects in mutants””. Second, counterintuitive 
effects arise when feedback loops are severed or hyperactivated, for 
example, in dmcl or rad50S backgrounds. The ZMM mutations likely 
impinge on multiple circuits simultaneously, removing restraints on 
Spol1 activity by disrupting homologue engagement and inhibiting 
Ndt80 activation, but also hyperactivating negative regulatory circuits 
via the DNA damage responsive kinase Tell (and possibly Mec1). 
This ‘push-me-pull-yow interplay undoubtedly affects the final num- 
ber and distribution of DSBs. Our results support the conclusion that 
crossovers in ZMM mutants are not identical in number and prov- 
enance to crossovers that form without ZMM intervention in wild type. 
Third, our findings explain puzzling aspects of set] mutant yeast and 
Prdm9~ mutant mice. If DSB number control is separate from Spol1 
targeting (which requires Set1 or PRDM9 (ref. 40)), then the default for 
Spol1 to make breaks until restrained by feedback explains why DSBs 
form in relatively normal numbers but different locations in these mutants. 
This also undermines more extreme versions of the ‘hotspot paradox’ 
in which biased gene conversion is predicted to eliminate all hotspots 
over time and thereby prevent DSB formation (for example, ref. 41): 
the logic of DSB control makes it impossible for inactivation of indi- 
vidual hotspots to render chromosomes immune to Spo11. Fourth, our 
findings support the hypothesis that altered DSB distributions tied to 
feedback control are the source of altered recombination distributions 
caused by certain mutations or heterozygosity for large-scale chromosome 
structure variants in other species”. Finally, we speculate that organisms 
such as mouse readily form synaptonemal complexes between nonho- 
mologous chromosome segments late in prophase (for example, ref. 8) 
as a means to shut down unproductive DSB formation in karyotypi- 
cally unbalanced cells. 

Our findings provide a holistic view of DSB control in the broader 
context of meiotic chromosome dynamics and meiotic progression, 
and explain how DSB formation is homeostatic and therefore robust 
against cell-to-cell variation, environmental perturbation, and chromo- 
some variants encountered in outcrosses. 


METHODS SUMMARY 


Yeast strains are of the SK1 background (Supplementary Table 6). Synchronized 
meiotic cultures were prepared according to standard methods. Labelling of Spol1- 
oligo complexes and purification, amplification and sequencing of Spo11 oligos 
were carried out using methods adapted from previous studies'®”’. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 
Yeast strains and plasmids. Strains were of the SK1 background (Supplementary 
Table 6). The zip1 deletion and one of the ndt80 deletions (ndt80A::LEU2) were 
provided by N. Kleckner, the msh5 and zip3 deletions were provided by N. 
Hunter, the spo16 deletion was provided by A. Shinohara, and a second ndt80 
deletion (ndt80A::kanMX4) was provided by S. Burgess. The dmc1, pph3 and zip4 
deletions were made by replacing the coding sequence with the hygromycin B 
phosphotransferase gene (hphMX4). Gene disruption was verified by PCR. All 
mutants analysed were moved into the desired tester strain backgrounds by 
crossing and tetrad dissection. The SPO11-Flag strain was provided by K. Ohta 
and the protein A tagging construct was provided by M. Rout. The constructs for 
two-dimensional gel electrophoresis analysis of crossover and noncrossover 
recombinants at CCT6, ERG1 and GATI were engineered by a series of two-step 
gene replacements. For CCT6 on chromosome IV, Sall sites were introduced in 
intergenic regions at Saccharomyces Genome Database (SGD) coordinates 832534 
and 838251 in one strain; Sall sites were introduced separately at coordinates 833537 
(in YDR186c) and 837893 (in CCT6) in another strain along with a Smal site between 
YDR186c and CCT6 at coordinates 835802 and 835803. For ERG1 on chromosome 
VIL, SacII sites were introduced in intragenic regions at coordinates 844276 (in 
RBG2) and 854464 (in OKP1) along with a Sall site at coordinate 848724 (between 
ERGI and ATEF2). In a separate strain, SacII sites were introduced at coordinate 
845470 (intergenic) and coordinate 852145 (in PBP1). For GAT1 on chromosome 
VI, KpnI sites were introduced at coordinates 90967 and 100083 (both intergenic) 
along with a BamHI site between FRS2 and GAT] at coordinates 95715 and 95717. 
Separately, KpnI sites were introduced at coordinates 92986 (BUD27) and coord- 
inate 98899 (intergenic). Further details are in ref. 42 and available on request. 
Culture methods. With the exception of Spo11-oligo mapping, synchronous mei- 
otic cultures were prepared as described**™. In brief, cells were grown in YPA (1% 
yeast extract, 2% Bacto Peptone, 1% potassium acetate) for 13.5-14h at 30°C, col- 
lected, re-suspended in 2% potassium acetate, and sporulated at 30 °C. 
Meiotic division profiles. Aliquots were collected from synchronous meiotic 
cultures and fixed in 47.5% (v/v) ethanol and 0.05 pig ml! DAPI. Mono-, bi- and 
tetranucleate cells were scored by epifluorescence microscopy. 
Direct DSB measurements and heteroallele recombination analysis. High- 
molecular-weight DNA was prepared and separated by pulsed-field gel electro- 
phoresis as described**. DNA was probed with part of the CHAI open reading 
frame (SGD coordinates 15838 to 16857), SKI8 (coordinates 90062 to 91228), 
YHL042w (coordinates 15671 to 16112) or POT (coordinates 40223 to 40728). DSB 
analyses at CCT6, ERG1 and GAT were performed as described'’. Blots were 
quantified by phosphoimager. For the DMC1" pulsed-field DSB analyses, the signal 
above the parental band (including the well) was split between the parental and 
DSB signals. For quantification in Fig. 1c and Extended Data Fig. 7b, our main 
interest was absolute DSB levels, so we calculated the average number of DSBs per 
chromatid assuming a Poisson distribution of breaks among and along chromatids 
in the population: P(m) = (j"e “)/n! (where yu is the mean number of DSBs per 
chromatid and P(n) is the probability that n DSBs occur on a single chromatid). 
The observed parental-length signal (U1, for ‘unbroken’) approximates the true 
unbroken fraction (that is, P(0) ~ Ups), so the mean total number of DSBs per 
chromatid in the population (DSB,o1a1) can be estimated as —In(U,,). This cal- 
culation helps correct for multiple DSBs on the same chromatid. A full description 
of the method, and confirmation that it does not overestimate DSB numbers, will 
be provided elsewhere (H. Murakami & S.K., unpublished observations). 
Because relative DSB levels were our main interest for data in Extended Data 
Fig. 2, we did not apply a Poisson correction and expressed DSBs instead as 
detectable broken DNA as per cent of total DNA in the lane. For the locus-specific 
DSB analyses the signal between the parental band and the wells was measured 
and apportioned evenly between the parental and DSB values. The frequency of 
meiotic recombination at ARG4 was determined by heteroallele recombination 
analysis as described”. 
Two-dimensional gel electrophoresis analysis of crossovers and noncross- 
overs. Cultures were grown and sporulated as described above at 30 °C. Samples 
(15 ml) were collected at 10 h and washed twice with 5 ml of 50 mM EDTA, pH 8.0. 
For analyses at CCT6 and ERG1, DNA was prepared in agarose plugs as described”. 
For analysis at GAT 1, DNA was prepared for conventional agarose electrophoresis’. 
DNA embedded in agarose plugs was digested with the appropriate restriction 
endonuclease (CCT6, SalI; ERG1, SacII), then electrophoresed at room temper- 
ature (20-23 °C) for 24-26h at 1.7V. cm! 0n 0.5% agarose in 0.5 X TBE (Tris/ 
borate/EDTA). DNA prepared for conventional agarose electrophoresis (GAT1) 
was digested with KpnI, then electrophoresed at room temperature for 24h at 
1.7Vcm'on0.5% agarose in 1 X TBE. A ~10.5 cm gel slice containing the region 
of interest was then excised. For analysis at CCT6 and ERGI the gel slice was washed 
twice in the appropriate NEBuffer supplemented with 20 pg ml’ BSA. For analysis 
at GAT] the gel slice was washed twice in 10 mM Tris-HCl, pH 8.1, followed by one 
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wash in NEBuffer 4 supplemented with 100 pg ml” ' BSA. Liquid was then replaced 
with fresh NEBuffer supplemented with 20 jug ml’ BSA or 100 pg ml * BSA, then 
4,900-5,000 units of the appropriate restriction endonuclease (CCT6, Smal; GAT1, 
BamHI; ERG1, SalI) was added and incubated first at 4°C overnight then at the 
optimal incubation temperature for ~24h. The gel slice was then cast in a 0.6% 
agarose gel in 0.5 X TBE (CCT6 and ERG1) or 1 X TBE (GAT), then electrophor- 
esed perpendicular to the first dimension at ~1.3 V cm for ~20h at room tem- 
perature. DNA was probed with part of the CCT6 open reading frame (SGD 
coordinates 837413 to 837865), part of the GAT1 open reading frame (SGD coor- 
dinates 95968 to 97490 or SGD coordinates 96500 to 97491), or part of the PBP1 
open reading frame (SGD coordinates 851379 to 851869). 

End-labelling of Spol1-oligo complexes and western blot analysis. Lysates and 
extracts were prepared as previously described“. Immunoprecipitation of Spol1- 
oligo complexes was performed using 5 |1g of mouse monoclonal anti-Flag M2 
antibody (Sigma). Precipitated Spol 1-oligo complexes were end-labelled in NEBuffer 
4 (New England Biolabs) containing 3-10 Ci of [ot-?P] dCTP and terminal deox- 
ynucleotidyl transferase (TdT)**. 25 ul of reaction mixture was added to the beads, 
mixed, and incubated at 37 °C for 1-2h. Spol1l-oligo complexes were eluted by 
adding 25 ul of NUPAGE loading buffer (diluted to 2X and supplemented with 
83.3 mM dithiothreitol (Invitrogen)) and boiling for 5 min. End-labelled Spol1- 
oligo complexes were separated on a Novex 4-12% gradient denaturing polyacry- 
lamide gel (Invitrogen) then transferred onto PVDF membrane using the iBlot 
protocol (Invitrogen) and visualized by phosphorimager. Blots were probed with 
mouse monoclonal anti-Flag M2 conjugated to horseradish peroxidase (Sigma). 
Chemiluminescent detection was performed according to the manufacturer’s instruc- 
tions (ECL+ or ECL Prime, Amersham). Protein quantity was estimated by sepa- 
rating 1 ul of extract on a Novex 4-12% gradient denaturing polyacrylamide gel 
and staining with Coomassie blue. 

Spo11-oligo purification for mapping. Spol1 oligos were prepared for sequen- 
cing similar to methods described previously", with modifications. Haploid strains 
with Spol11 C-terminally tagged with five copies of the protein A tag were patched 
from a frozen stock onto a YP-glycerol plate and grown at 30 °C overnight to select 
for respiration competence. Cells were mated on YPD (1% yeast extract, 2% Bacto 
Peptone, 2% dextrose) plates then streaked for single colonies and grown for 48 h at 
30°C. A single colony was inoculated into 5 ml liquid YPD medium and grown 
overnight at 30 °C. The saturated YPD culture was used to inoculate 25 ml liquid 
SPS medium (0.5% yeast extract, 1% peptone, 0.67% yeast nitrogen base without 
amino acids (Difco), 1% potassium acetate, 0.05 M potassium biphthalate, pH 5.5) 
to OD¢oo 0.8 and grown for 7 h at 30 °C. This culture was used to inoculate 11 SPS 
medium in a 2.81 baffled Fernbach flask to ODgo9 0.05. Flasks were incubated at 
30°C for 12-16 h, to ODgoo 4.5-6. Cells were collected by centrifugation, washed 
once in deionized water, re-suspended in 0.6 | sporulation medium (2% potassium 
acetate and 0.001% antifoam 204) and incubated in 2.81 baffled flasks (0.61 per 
flask) at 30 °C for 4h (wild type) or 5 h (zip3) to approximate times of peak Spo11- 
oligo levels (Fig. le). 

Cells were centrifuged and washed with 50 mM EDTA, transferred to a 30-ml 
syringe, extruded into liquid nitrogen, and stored at —80 °C. Yeast cell powder was 
prepared by placing the frozen paste into canisters of a Retsch MM301 mill (pre- 
chilled in liquid nitrogen) and grinding five times for 3 min at 30 Hz. Yeast powder 
was transferred to a pre-chilled 50-ml tube and stored at —80°C. Extract was 
prepared by transferring the yeast powder to a pre-chilled 40-ml glass Dounce 
homogenizer and homogenizing in two volumes of cold 10% trichloroacetic acid. 
Lysate was centrifuged at 14,000 r.p.m. in an SS-34 rotor (Sorvall) for 20 min. The 
supernatant was removed and cell pellet was re-suspended in SDS extraction buffer 
(2% SDS, 0.5M Tris-HCl, pH8.1, 10mM EDTA, 0.005% bromophenol blue). 
B-mercaptoethanol was added to 0.288 M, the extract boiled in a water bath for 
5 min, then centrifuged at 14,000 r.p.m. in an SS-34 rotor for 20 min. 

The supernatant was poured into fresh tubes, diluted with an equal volume of 
2X immunoprecipitation (IP) buffer (2% Triton X-100, 30 mM Tris-HCl, pH 8.0, 
300 mM NaCl, 2 mM EDTA) and incubated with CL6B-sepharose beads (GE) for 
mock IP (4 hat 4 °C mixing end-over-end, 1.5 ml extract per 200 pl beads). Supernatant 
was removed into fresh tubes and mock beads were stored on ice. The supernatant 
was incubated with 200 ll IgG Sepharose Fast Flow beads (GE) per 1.5 ml of extract 
for 4h at 4°C mixing end-over-end, then beads were recovered. Mock and IP 
beads were washed 3 times with 10 ml cold 1 X IP buffer. Protein was eluted from 
mock or IP beads with 350 jl 2 > NuPAGE LDS buffer (Invitrogen) by boiling for 
5 min, followed by a second elution with 350 jl 0.5 X NuPAGE LDS buffer. The 
eluates were combined and diluted with 700 ul of 2 X IP buffer, then incubated 
with 200 pl fresh CL6B-sepharose beads (mock) or IgG Sepharose Fast Flow beads 
(IP), 4 °C overnight with end-over-end rotation. The beads were recovered and sub- 
sequently washed with 1 ml Proteinase K buffer (100 mM Tris-HCl, pH 7.4, 1 mM 
EDTA, 0.5% SDS, 1mM CaCl) lacking SDS, then re-suspended in 600 pl 
Proteinase K buffer and 100 1g purified Proteinase K, and incubated overnight 
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at 50 °C with end-over-end rotation. The supernatant was collected using a SPIN- 
X tube (Corning) and ethanol precipitated with 0.3 volume of 9M ammonium 
acetate, 10 jig of DNA-free glycogen and 2.5 volumes of 100% ethanol. Spo11 oligos 
were quantified by end labelling with [ct-?*P]GTP and TdT (Fermentas) and com- 
paring to a known quantity of similarly labelled 30-nucleotide synthetic oligo’. 
Library preparation for sequencing. Approximately 300 fmol of Spol11 oligos 
were subjected to GTP tailing at their 3’ ends. Material eluted from mock beads 
was processed in parallel to determine specificity of the IP (data not shown). 
Tailing was carried out in a total volume of 40 il containing 1 X NEBuffer 4, 20 U 
TdT and 13.8 1M GTP at 37 °C for 5h, followed by heat inactivation of TdT at 
75 °C for 10 min. The tailed oligos were ligated to a double-stranded DNA adaptor 
optimized for the [lumina HiSeq platform as follows: the tailing reaction was 
supplemented by addition of 10 X T4 RNA ligase 2 buffer (500 mM Tris-HCl, 
pH7.6, 50 mM MgCl, 50 mM B-mercaptoethanol) to 1X, 25mM ATP to 0.5mM, 
5 pmol double-stranded customized P7 adaptor, 300 fmol T4 RNA ligase 2 (gift 
from Stewart Shuman, MSKCC), and dH,0 to a final volume of 50 ull. P7 adaptor 
sequences are 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCCCC 
and 5’-pAGATCGGAAGAGCACACGTCTGAACTCCAGTCACppT, where CppT 
is an inverted 3’-3’ linkage to block ligation. These oligos were annealed and purified 
by non-denaturing polyacrylamide gel electrophoresis before use in the ligation 
reaction. Ligation was carried out overnight at room temperature. Complementary 
strands of Spol1 oligos were synthesized as follows: the ligation reaction was sup- 
plemented with 2mM dNTP to a final concentration of 30 1M and 10 U Klenow 
polymerase (New England Biolabs), and incubated at 25 °C for 15 min. After Klenow 
inactivation (75 °C, 10 min), extension reactions were supplemented with 0.3 volume 
of 9 M ammonium acetate, 10 jig of DNA-free glycogen and 2.5 volumes of 100% 
ethanol. DNA was precipitated at —20°C overnight and centrifuged at 16,000g. 
The pellet was rinsed with 70% ethanol, air dried, and dissolved in a mixture of 9 ull 
water and 15 pl formamide loading buffer. Extension products and 10-bp ladder 
(radiolabelled with T4 polynucleotide kinase and [y-**P] ATP) were separated on a 
10% denaturing polyacrylamide gel. The region between ~55-200 nt (equivalent 
to ~10-50-nucleotide Spo11 oligos with (rG)3~s tails plus ligated adaptor) was 
excised, crushed, and eluted in 400 pil 10 mM Tris-HCl, pH 8.0 at 37 °C overnight 
with mixing. Elution mixture was spun through a SPIN-X tube, then 0.3 volume of 
9M ammonium acetate, 10 ug DNA-free glycogen, and 2.5 volume of 100% eth- 
anol were added. DNA was precipitated on dry ice at —20°C overnight and cen- 
trifuged at 16,000g. Pellet was rinsed with 70% ethanol and air dried. The 3' ends of 
gel-purified, denatured DNA strands were tailed with GTP by dissolving the dried 
pellet in 40 pl tailing reaction containing 1 x NEBuffer 4, 30 U TdT and 504M 
GTP, then incubating at 37 °C for 5 h. The tailed oligos were ligated to a second set 
of customized double-stranded DNA adaptors (P5) and complementary strands 
were synthesized as above. The P5 adaptor is a mixture of four duplexes. The oligos 
for one duplex are 5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCTAG 
TCTCCCC (top strand) and 5'-pAGACTAGATCGGAAGAGCGTCGTGTAGG 
GAAAGAGTGTppT. The four different P5 duplexes have different sequences at 
the underlined positions (top strand: AGTC, GTCA, TCAG, CAGT, respectively), 
which are the first four bases that will be read by the sequencer. Complementary 
pairs of oligos were annealed separately and purified by non-denaturing electro- 
phoresis, then the four duplexes were mixed in approximately equimolar ratio. 
This provides diversity of base composition at the beginning of the sequencing 
reaction. If this diversifier region were not present, the sequencer would encounter 
a homogeneous oligo-C sequence for every read, compromising ability to detect 
individual amplification clusters immobilized on the flow cell surface. 

To estimate the yield, test PCR was carried out in a total of 30 pl containing 
1-2% of the final Klenow extension reaction, 1 X PCR buffer (Invitrogen), 2 mM 
MgCl, 0.2mM dNTP, 1.5 U Taq polymerase (Invitrogen) and 1 1M P5 primer: 
5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG, 
comprising Illumina sequencing primer and part of P5 adaptor, 1 1M Indexing 
primer: 5’-CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGT 
TCAGACGTGTG (underlined sequence is Illumina HiSeq Index 1, replaced as 
appropriate with the sequence of other indices according to manufacturer instruc- 
tions), comprising Illumina-specific primer and part of the P7 adaptor). The mix- 
ture was divided into three tubes and PCR was initiated by a denaturation step at 
94 °C for 10s, followed by 20 cycles of amplification (94 °C for 10s, 60 °C for 10s 
and 72 °C for 10s). PCRs were combined and the products were electrophoresed 
on a 10% non-denaturing polyacrylamide gel with low-molecular-weight DNA 
ladder (New England Biolabs) to determine the size and quantity of PCR product 
after staining with ethidium bromide. Template for Illumina sequencing was pre- 
pared by a large-scale PCR with same conditions as above (but only 16 cycles 
instead of 20), scaled up to a total volume of 640 il containing the desired amount 
of the Klenow-extended products. Sequencing was performed using Illumina 
HiSeq in the Memorial Sloan Kettering Cancer Center Genomics Core Laboratory. 


Bioinformatic analysis. Statistical analyses were performed using R version 2.15.3 
(http://www.r-project.org/)” or GraphPad Prism 6.0. Mapping of Illumina reads 
to the target genome was performed using a pipeline essentially as described"’. In 
brief, adaptor sequences were removed from both the 5’ and 3’ ends, then reads 
were mapped to the S. cerevisiae genome (SGD version June 2008, that is, sacCer2) 
using gmapper-ls (2_1_1b) from the SHRiMP mapping package”. The specific 
mapping parameters used were -U -g -1000 -q -1000 -m 10 -i -20 -h 100 -r 50% -o 
1001, which forces ungapped alignments (-U by itself did not suppress all gapped 
alignments so we set an effectively infinite gap opening penalty). To increase 
sensitivity for short reads we set the seeds to the following: -n 1 -s 1111111111, 
11110111101111, 1111011100100001111, 1111000011001101111. After mapping, 
the reads were separated into unique and multiple mapping sets, but only uniquely 
mapping reads were analysed in this study (multiple mapping reads constituted a 
small minority of the total). A full copy of the source code is available online at 
http://cbio.mskcc.org/public/Thacker_ZMM_feedback. 

Because the rDNA array is represented in the SGD assembly by only 1.9 copies 
of the repeat unit, oligos that span the boundary between repeats map to a single 
position even though they come from a repetitive sequence. Therefore, reads of 
this type were moved to the multiple mapping set. The wild-type data sets con- 
tained a small number of spurious reads (<0.4% of total) from contamination of 
the Spoll-oligo sequencing libraries with PCR primers from the TEL1 locus; 
these reads were deleted from the maps. Because of the variable number of rG 
residues added by terminal transferase to the 3’ end of Spo11 oligos and to the 3’ 
end of the reverse complementary strands, there is ambiguity in defining the precise 
start and end positions for reads that map to positions starting with one or more C 
residues or ending in one or more G residues'’. In such cases, the 5’ and 3’ ends of 
each read were defined so as to provide the longest contiguous sequence match 
with the genome. 

Raw and processed sequence reads have been deposited in the Gene Expression 
Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/ (accession number 
GSE48299). This accession also contains the curated maps (unique mapping reads 
only) in wiggle format to allow direct visualization in appropriate genome brow- 
sers, for example, the UCSC browser (http://genome.ucsc.edu/ using genome version 
sacCer2). 

For the studies here, our focus is on the number and position of DSBs rather 
than characteristics of the Spo11 oligos themselves, so maps were distilled to record 
just the positions of 5’ ends of oligos. Each map was normalized to the total number 
of reads that mapped uniquely to a chromosome (RPM; excluding reads mapping 
to mitochondrial DNA or the 2p plasmid), then wild-type and zip3 maps were 
averaged. Normalized Spo11-oligo counts within the 3,600 previously identified 
hotspots" are compiled for each data set in Supplementary Table 4. In analyses 
evaluating the fold change (that is, Fig. 4c, d, g and Extended Data Fig. 9a, d-f), we 
assumed a global increase in Spol1-oligo number of 1.8-fold based on the differ- 
ence in peak steady-state levels (Fig. le). To prevent dividing by zero and to 
minimize variability of ratios caused by small changes in denominators, we added 
a small constant to numerator and denominator before taking the ratio (20 RPM 
for hotspot-based ratios (approximately 15% of median hotspot Spol1-oligo count), 
or 0.1 RPM per kb for bin-based ratios (0.13% of median Spo11-oligo density per 
bin)). Where indicated, Spol1-oligo maps were smoothed with a 201-bp Hann 
window. 

For the correlation analysis in Fig. 4f and Extended Data Fig. 9b, chromosomes 
were divided into non-overlapping bins of the indicated sizes. Bins that over- 
lapped censored regions (within 20 kb of telomeres, within + 10 kb of centro- 
meres, or in the region from 60 kb leftward to 30 kb rightward of the rDNA) were 
discarded. The published ChIP enrichment data (log, of ChIP/input; from GEO 
accession GSE29860 (ref. 38) or Supplementary Table 3 from ref. 39) were aver- 
aged within each bin, then compared to the mean log-fold change in Spol1-oligo 
density in zip3 and correlation coefficients were calculated. The log-transformed 
data were approximately normally distributed so we used Pearson’s r, but similar 
overall patterns were obtained if we used Kendall’s tau (data not shown). 

Multiple linear regression was performed using the ‘Im’ function in R. Data 
were averaged in non-overlapping bins of 35 kb, censored for subtelomeric, peri- 
centric and rDNA-proximal regions as described above. Principal component 
analysis was performed on the correlation matrix of the Recl14, Mei4, Mer2, 
Hop] and Red1 ChIP data using the ‘princomp’ function in R. The first principal 
component accounted for 92.7% of the variance in this data set; the remaining 
principal components were discarded as they accounted for only 4.1%, 2.2%, 0.8% 
and 0.3% of the variance, respectively. 

To assess spatial correlations for the change in Spol 1-oligo density (Fig. 4e), we 
calculated the correlation coefficient (Pearson’s r) between the log-fold change at 
DSB hotspots and the log-fold change for hotspots located within a set of 5-kb 
windows centred a distance D to the right of each hotspot centre. We varied D from 
5 to 200 kb in steps of 2.5 kb and calculated the correlation coefficient separately for 
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each distance. For comparison, we performed the same analysis to evaluate the 
correlation between absolute heat (log of the Spol1-oligo count) in hotspots and 
the heat in 5-kb windows at varying distances. To generate randomized controls 
for this analysis, we randomly reassigned the heats or log-fold change between hot- 
spots within a chromosome. This randomization strategy preserves the non-random 
placement of hotspots relative to one another and preserves the correlated beha- 
viour (if any) across whole chromosomes. Randomization was repeated 100 times 
to provide the estimates of the 95% confidence intervals shown in the Fig. 4e. 
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Extended Data Figure 1 | Chromosomal breaks in msh5 and zip1 mutants. 
Representative pulsed-field gel Southern blots probed for chromosome IX are 


shown, labelled as in Fig. 1b. 
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Extended Data Figure 2 | DSB formation appears normal in SPO11-Flag 
and SPO11-PrA strains. a, Southern blots probed for chromosome III. 
High molecular weight chromosomal DNA was purified 6h after transfer to 
sporulation medium from meiotic rad508S cultures carrying the indicated 
SPO11 alleles (in spol 1-yf the catalytic tyrosine 135 is mutated to 
phenylalanine), then separated on pulsed-field electrophoresis gels. Samples 
from a rad50S spo11-HA strain are shown for comparison; haemagglutinin- 
tagged Spo11 has reduced DSB frequency. Each lane represents an independent 


culture (SPO11* samples from the same cultures were run on both gels). PrA, 
protein A. b, Quantification of blots in panel a and separate blots (not shown) 
probed for chromosomes VII or VIII. Break frequencies are per cent of DNA in 
lane (mean + s.d. of 3-4 cultures). Numbers in parentheses indicate values 

from each tagged strain relative to SPO11~ for the same chromosome. Relative 


DSB frequencies at the bottom are averages across the three chromosomes 
assayed. 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Wild type msh5 Wild type zip1 
0234568 0234568 kba 023456810023 45 6 810 kDa 


[100 100 


= 


Autorad 


ae ~osbhe- 


Wild type zip1 : 
023456810023 45 6 810 


SSer rb Baer 
BEDESRREER 


WB 


Extended Data Figure 3 | Spol1-oligo complexes in msh5 and zip1 mutants. Representative time courses are shown. 
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Extended Data Figure 4 | Analysis of recombination at three natural 

DSB hotspots. a, b, Recombination reporters at the ERG1 (a) and GAT1 

(b) hotspots. c—e, Representative Southern blots of parental and recombinant 
DNA molecules at CCT6 (c), ERG1 (d) and GAT1 (e). The arrowhead in 
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e indicates a non-reproducible radiolabelled species. f, Local distribution of 
DSBs around recombination reporter locations is not altered in zip3 mutants. 
Spol11-oligo profiles (averages for wild type and zip3 mutant) are smoothed 
with 201-bp Hann window; zip3 values are offset to separate profiles. 
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Extended Data Figure 5 | Direct analysis of DSB formation at natural 
hotspots. a-d, Representative Southern blots of DNA separated on a 
conventional agarose gel and probed for GAT1 (a), CCT6 (b, c) and ERGI (d). 
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The arrowhead in a indicates signal from the CCT6 parental band that 
remained after stripping and reprobing for GAT1. e, Quantifications for 
b-d (mean = s.d. for 3 cultures). 
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Extended Data Figure 6 | Spol1-oligo complexes in msh5 ndt80 double 
mutant. Representative time courses are shown. 
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Extended Data Figure 7 | Effects of dmc deletion or spol 1 hypomorphic 
mutation on ZMM mutant phenotypes. a, b, ZMM status is irrelevant in a 
dmcl background. Broken chromosomes accumulate to similar levels in a dmc1 
single mutant and dmc1 zmm double mutants. Representative pulsed-field gel 
Southern blots probed for chromosome IX are in a and Poisson-corrected 
quantification of DSBs is in b (mean + s.d., 3 cultures). c, Reducing Spol1 
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activity in a zip3 mutant partially alleviates the prophase I delay/arrest. 
Meiotic progression was assessed by staining with DAPI (4’,6-diamidino-2- 
phenylindole) and measuring the percentage of cells that had completed 
meiosis I (MI) with or without completing meiosis II (+ MII). Data are 
means ~ s.d. for 3 cultures, except wild type and spol 1-HA, each analysed once. 
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Extended Data Figure 8 | Spol1-oligo mapping in wild type and zip3 
mutant. a, b, Quantitative reproducibility of Spol1-oligo maps. In 

a, comparisons are shown for individual wild type (WT) or zip3 data sets from 
the present study, or the previously published spo11-HA data (from ref. 11). 
Uniquely mapped Spol11 oligos were summed in non-overlapping 5-kb bins 
and expressed as RPM per kb (plotted on a log scale). In b, pairwise correlation 
coefficients for the data sets from the current study are shown (Pearson’s r; box 
colours scaled from blue to red proportional to strength of correlation). For the 
comparison of this study’s wild-type average with data from Pan et al., 
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Position on chromosome III (kb) 


212 


r = 0.949. Note that Pan et al. used a different strain background with different 
auxotrophies, which may alter DSB distributions’’”’, and a hypomorphic spol 1 
allele (spol1-HA), which may affect DSBs to different extents at different 
locations*’. Note that biological replicates (WT-1 versus WT-2 or zip3-1 versus 
zip3-2) agreed better than comparisons between cultures of different genotype. 
c, DSBs form at the same hotspots and with similar distribution within and 
between hotspots in wild type and zip3. Unsmoothed Spo11-oligo maps are 
shown in the vicinity of the well-characterized ARE1 (YCRO48w) hotspot. 
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Extended Data Figure 9 | Changes in the DSB landscape in zip3 mutant. 
a, Change in Spo11-oligo counts in hotspots grouped by chromosomal context. 
Tel, within 20 kb of telomeres; Cen, within + 10 kb of centromeres; rDNA, 
from 60 kb leftward to 30 kb rightward of rDNA; Interstitial, all others. Dashed 
lines mark values assumed as no change and average change (1.8-fold). Boxes 
indicate median and interquartile range; whiskers indicate the most extreme 
data points which are =1.5 times the interquartile range from the box; 
individual points are outliers. Subtelomeric and pericentric zones show less 
increase in zip3 on average, thus, ZMM-dependent feedback contributes less 
than other, unknown factors to suppressing DSBs in these regions. The zone 
near the rDNA showed no increase or was even decreased; thus, zip3 mutants 
are competent for this region’s DSB suppression, which is dependent on the 
ATPase Pch2 and the replication factor Orcl (ref. 54). Note that the remaining 
interstitial hotspots showed highly variable response to zip3 mutation (>20 
fold). b, Correlation between log-fold change in Spol1-oligo counts in zip3 
and the binding of the indicated proteins, binned in non-overlapping windows 
of varying size. Closed symbols, P < 0.05. ChIP data are from ref. 38. c, Average 
ChIP profiles around interstitial hotspots divided into three equal-sized groups 
according to the average fold change in zip3. Top, the box and whisker plot 
(as described for a) shows the distribution of fold changes for the three groups. 
Bottom, ChIP profiles for each of the indicated proteins. Note that the profiles 
lie atop one another for Rec102 and Rec104. Dashed arrows indicate direction 


ARTICLE 


of the change in the average profiles with increasing fold change in zip3. 
ChIP data are from refs 38 and 39. d, High degree of colinearity of log,- 
transformed ChIP data** for Recl14, Mei4 and Mer2 (which are essential for 
DSB formation) and Hop] and Red1 (axis proteins that promote normal 
DSB formation). More than 90% of the variance for this combination of ChIP 
data is captured in the first principal component (PC1). The high degree of 
correlation between these proteins was described previously*’. e, Correlations 
between the fold change in zip3 (zip3 FC, log, and assuming 1.8-fold increase 
genome-wide) and various chromosomal features: principal component 1 

for Rec114, Mei4, Mer2, Hop] and Red1 ChIP data (same as in d); chromosome 
size (log.(bp)); G+C content (%); and ChIP data for the indicated proteins 
(log). In d and e, top right panels show pairwise scatter plots and bottom left 
panels show corresponding correlation coefficients (Pearson’s r) for data for 
interstitial regions binned in 35-kb non-overlapping windows. Essentially 
identical results were obtained with different window sizes (20-40 kb) or with 
varying placement of windows (data not shown). f, Essentially no correlation 
between DSB activity in wild type and change in zip3, whether considering 
interstitial regions divided into non-overlapping 35-kb bins (upper panel) or 
interstitial hotspots (lower panel). A 1.8-fold increase genome-wide in zip3 is 
assumed. Note: fold change is labelled according to a linear scale but plotted in a 
log scale in panels a, ¢, f. 
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Two y-ray bursts from dusty regions with little 


molecular gas 


B. Hatsukade', K. Ohta’, A. Endo’, K. Nakanishi’*°, Y. Tamura®, T. Hashimoto! & K. Kohno®’” 


Long-duration y-ray bursts are associated with the explosions of 
massive stars’ and are accordingly expected to reside in star-forming 
regions with molecular gas (the fuel for star formation). Previous 
searches for carbon monoxide (CO), a tracer of molecular gas, in 
burst host galaxies did not detect any emission” *. Molecules have 
been detected as absorption in the spectra of y-ray burst afterglows, 
and the molecular gas is similar to the translucent or diffuse molecu- 
lar clouds of the Milky Way”*. Absorption lines probe the interstel- 
lar medium only along the line of sight, so it is not clear whether the 
molecular gas represents the general properties of the regions where 
the bursts occur. Here we report spatially resolved observations of 
CO line emission and millimetre-wavelength continuum emission in 
two galaxies hosting y-ray bursts. The bursts happened in regions rich 
in dust, but not particularly rich in molecular gas. The ratio of molec- 
ular gas to dust (<9-14) is significantly lower than in star-forming 
regions of the Milky Way and nearby star-forming galaxies, suggest- 
ing that much of the dense gas where stars form has been dissipated 
by other massive stars. 
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We selected the two y-ray burst (GRB) hosts (GRB 020819B at a 
redshift of z = 0.41 and GRB 051022 at z = 0.81) with high star-forma- 
tion rates (SFRs) and high gas metallicity among GRB host galaxies to 
maximize the possibility of detecting the CO emission line and dust 
continuum emission. The GRB 020819B host shows an extinction- 
corrected SFR of ~10-30M,5 yr! (where M5 indicates solar mass) 
derived from ultraviolet continuum emission, the Ha emission line, 
and spectral energy distribution (SED) fitted using infrared data’. 
The SFRs at spatially resolved positions are also derived from the Ha 
emission line, which are 10.2Mo yr! and 23.6Mo yr’ at the nuclear 
region and at the GRB explosion site, respectively*. The host galaxy of 
GRB 051022 shows an extinction-corrected SFR of ~20-70Ma yr! 
derived from ultraviolet continuum emission, the [O11] emission line 
at rest-frame wavelength 4 = 3727 A, the SED fitted with infrared data, 
and radio continuum emission””""’. The gas metallicity is measured at 
the GRB 020819B site, the nuclear region of the GRB 020819B host, 
and at the GRB 051022 host, and they all have at least solar metalli- 
city*"*. The two GRBs are classified as ‘dark GRBs’*"’, whose afterglow 


Figure 1 | CO maps, 1.2-mm 
continuum maps and optical 
images of the GRB hosts. The 
magenta cross represents the 
position of the radio afterglow. The 
ALMA beam size is shown in the 
lower left corners of a, b, d and 

e (black and white ovals). a, Velocity- 
integrated CO(3-2) intensity map. 
Contours start from +30 with 20 
step (lo = 0.040 Jy beam™! kms’). 
b, 1.2-mm continuum map. 
Contours start from +30 with lo 
step (lo = 0.030 mJy beam“ *). 

c, Optical R-band image obtained 
with the Gemini North Telescope. 
d, Velocity-integrated CO(4-3) 
intensity map. Contours 

start from 3o with lo step 

(1o = 0.037 Jybeam 'kms™'). 

e, 1.2-mm continuum. Contours are 
+30 (lo = 0.032 mJy beam '). 

f, Optical R-band image obtained 
with the Gemini South Telescope. 
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Figure 2 | CO spectra of the GRB hosts. Continuum emission is subtracted. 
a, CO(3-2) spectrum of the nuclear region of the GRB 020819B host at 
20kms_' resolution. A Gaussian fit to the emission line gives a redshift of 
z= 0.410 anda velocity width of 167 km s_'(FWHM).b, CO(4-3) spectrum of 
the GRB 051022 host at 30kms ' resolution. A Gaussian fit to the emission 
line gives a redshift of z = 0.806 and a velocity width of 176 kms * (FWHM). 


is optically dark compared with what is expected from X-ray after- 
glows. The origin of the dark GRBs is not yet well understood, but a 
plausible explanation is dust obscuration along the line of sight to 
GRBs?!°*!°, 

We conducted observations of the CO emission line and 1.2-mm 
continuum towards the GRB hosts using the Atacama Large Mil- 
limeter/submillimeter Array (ALMA). We observed the redshifted 
CO(3-2) line for the GRB020819B host and the CO(4-3) line for 
the GRB 051022 host. The angular resolution is ~0.8'’ X 0.7'’ (~4 kpc 


Table 1 | Properties of GRB host galaxies 


X 4kpc) and ~1.0"’ X 0.7'' (~8 kpc X 5 kpc) (full-width at half max- 
imum; FWHM) for the GRB020819B host and the GRB051022 
host, respectively. The GRB 020819B host is spatially resolved in the 
observations. The CO emission line is clearly detected at the nuclear 
region of the GRB 020819B host and the GRB 051022 host (Figs la and 
dand 2). While molecular gas has been detected in absorption in spec- 
tra of GRB afterglows**, this is the first case for detecting spatially re- 
solved molecular gas emission in GRB hosts”*. The component size 
of the CO emission (deconvolved from beam) derived from a Gaussian 
fitting is 3.2 1.5 kpc (FWHM). The molecular gas mass estimated 
from the CO emission is Mga; = 2.4 X 10°Mo and 2.1 X 10°Mo for 
the nuclear region of the GRB 020819B host and the GRB 051022 host, 
respectively (see Methods and Table 1). The molecular gas mass is 
comparable to those of local massive spiral galaxies”, and lower 
than those of z ~ 1-2 normal star-forming galaxies” or submillimetre- 
luminous galaxies”. The fraction of molecular gas mass to stellar 
mass’** for the hosts is ~0.1, which is comparable to those of local 
spiral galaxies”. 

The 1.2-mm continuum emission is also detected in both GRB hosts 

(Fig. 1b, e). The spatially resolved continuum map of the GRB 020819B 
host shows that the emission is significantly detected only at a star- 
forming region ~3’' (16kpc in projection) away from the nuclear 
region, where the GRB explosion occurred. The size of the continuum 
emission deconvolved from the telescope beam is ~1.7 kpc X 1.0 kpc. 
We regard the continuum emission as dust thermal emission origin- 
ating in star-forming activity (see Supplementary Information). By 
assuming that the dust emission is described as a modified blackbody 
and using the dust temperature and emissivity index derived from 
fitting, we derive the dust masses of Maust = 4.8 X 10’Mo and 2.9 x 
10’M. for the GRB 020819B site and the GRB 051022 host, respect- 
ively (see Methods). The far-infrared luminosity and SFR are Ly = 
1.1 X 10''Le (where Lo is the solar luminosity) and SFR = 18 Mo yr! 
for the GRB 020819B site, and Lpp = 1.9 X 10'' Lo andSFR = 32Moyr' 
for the GRB 051022 host, respectively. The SFRs are comparable to the 
extinction-corrected SFRs derived from ultraviolet and optical obser- 
vations, suggesting that there is no sign of an extra, optically completely 
invisible portion of star formation that cannot be recovered by extinc- 
tion correction. 

Of particular interest is that the spatial distributions of molecular 
gas and dust are clearly different in the GRB 020819B host. The ratio of 
molecular gas mass to dust mass of the GRB 020819B host is >51-60 
and <9-14 (3c limits with uncertainty from dust mass) at the nuclear 
region and the GRB site, respectively. The ratio in the GRB site is 
significantly lower than that of the nuclear region, indicating that 
the GRB occurred under particular circumstances within the host. 
The molecular gas-to-dust ratio at the GRB site is also lower than those 
of the Milky Way and nearby star-forming galaxies”, suggesting that 
the star-forming environment where GRBs occur is different from 
those in local galaxies. While the correlation between gas-to-dust ratio 


Property GRB 020819B nuclear region GRB020819B site GRB051022 
Zco 0.410 0.806 
CO transition 3-2 4-3 
L'coca-oy (K kms” pe?) (5.5 +0.4) x 10° <13 x 10° (4.9 + 0.9) x 10° 
Mgas (Mo) (2.4+0.2) x 10° <5.4 x 108 (2.1 +0.4) x 109 
S1.2mm (mJy) <0.12 0.14 + 0.03 0.10 + 0.03 
Maust (Mo) <4.2 x 107 (4.8 + 1.0) x 10’ (2.9 + 0.9) x 10” 
Lrig (Lo) <9.3 x 10/9 (1.1 +0.2) x 101? (1.9+0.6) x 101? 
SFR (Mo yr?) <16 18+4 32+10 
Megas! Maust >51-60 <9-14 58-86 


The errors represent root-mean-square (1c) uncertainties from the photometry error. The limits are 3c. We adopt a cosmology with Ho = 71kms ! Mpc !, Qu = 0.27, and Q, = 0.73. For details, see Methods. 
Zco is the redshift determined from the CO line. L'cog-o) is the CO(1-0) luminosity derived from L'co = 3.25 x 107 ScoAWops 2D,2(1 + 23, where L'co is in units of Kkms~* pc’, ScoAv is the velocity-integrated 
flux in Jy kms}, vops is the observed line frequency in GHz and D, is the luminosity distance in Mpc. We assume a CO line ratio of CO(3-2)/CO(1-0) = 0.93 and CO(4-3)/CO(1-0) = 0.85, which are the values for 
the local star-forming galaxy M82. Mgas is the molecular gas mass derived from Mgas = %col'coci-oy, Where aco is the CO-to-molecular gas mass conversion factor of a Galactic value aco = 4.3 in units of 

Ms (Kkms_? pe?)-}, Si.2mm isthe 1.2-mm continuum flux. Mgust is the dust mass derived from Mgust = SopsDL2/[(1 + Z)ra(res BO rests Ta)], where Sop; is the observed flux density, vest is the rest frequency, Ka(Vrest) 
is the rest-frequency mass absorption coefficient and B(\yest,7a) is the Planck function. Lrig is the far-infrared luminosity derived from Lrig = 40Mglo*Ka(v)B(v, Ta)dv. SFR is the star-formation rate 


derived from SFR (in Ms yr!) = 1.72 x 107?° Leip (in Lo). 
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and metallicity has been observed”, it is unlikely that the large dif- 
ference between the GRB site and the nuclear region is attributable to 
the difference in metallicity because both regions have a similar metal- 
licity®. The difference of distribution between molecular gas and dust is 
also seen in the GRB 051022 host and the GRB site seems to be a dust- 
rich region, although the angular resolution is not good enough to be 
certain. The possible reasons for the deficit of molecular gas in the GRB 
site are that much of the dense gas where stars form has been incor- 
porated into stars, or dissipated by a strong interstellar ultraviolet 
radiation field, which is expected in regions with intense star forma- 
tion. The lack of molecular gas in optical spectra of GRB afterglows 
has been reported” and a possible explanation is the dissociation of 
molecules by ambient ultraviolet radiation with 10-100 times the 
Galactic mean value from the star-forming regions where GRB pro- 
genitors reside**’’”. GRB hosts with a mean ultraviolet radiation field 
of 35-350 times the Galactic mean value have been observed”’. The 
molecular gas-to-dust ratio in GRB hosts could be an important indi- 
cator of an environment where GRBs occur. 

The occurrence of GRB 020819B in a dust-rich region supports the 
idea that the dust extinction is the cause of the darkness of the optical 
afterglow’*™*. The molecular gas-to-dust ratio in the GRB site is com- 
parable to or lower than the ratios in nuclear regions of local galaxies of 
~20-40 (ref. 29) and submillimetre-luminous galaxies of ~50 (ref. 
30), suggesting the existence of GRBs that could occur in dusty galaxies 
such as submillimetre-luminous galaxies. 


METHODS SUMMARY 


We conducted ALMA observations of the GRB 020819B host and the GRB 051022 
host at 245.072 GHz and 255.142 GHz, respectively, with a bandwidth of 1,875 MHz 
and with 24-27 antennas. The data were reduced with the Common Astronomy 
Software Applications package in a standard manner. The maps were processed 
with the CLEAN algorithm with Briggs weighting (with the robust parameter 
equal to 0.5). The final synthesized beam size (FWHM) is ~0.8"' x 0.7’' and 
~1.0"’ X 0.7’ for the GRB 020819B host and the GRB 051022 host, respectively. 
We derived the molecular gas mass of Mgas = (2.4 + 0.2) X 10’Mo and (2.1 + 0.4) X 
10°Mo for the nuclear region of the GRB 020819B host and the GRB 051022 host, 
respectively. Here we assume CO line ratios of CO(3-2)/CO(1-0) = 0.93 and 
CO(4-3)/CO(1-0) = 0.85, which are the values for the local star-forming galaxy 
M82, by considering the star-forming property of the hosts. We adopt the CO- 
to-molecular-gas mass conversion of Galactic value (43Mo6 (Kkms ! pe’)') 
because the metallicity of the two hosts is close to the solar metallicity. To estimate 
a dust temperature and an emissivity index, we fitted a single temperature modified 
blackbody form to the far-infrared—millimetre photometry of the ALMA 1.2-mm 
data and the Herschel Space Observatory 100-,1m, 160-jum and 250-p1m data. The 
best-fitting results are Taus = 28 + 3K and 6 = 1.9+0.3 for the GRB020819B 
host and Taust = 34 + 6 Kand f = 1.8 + 0.5 for the GRB 051022 host. By using the 
best-fitting modified blackbody functions, we derived dust masses of (4.8 + 1.0) X 
10’Mo and (2.9 + 0.9) X 10’Mz for the GRB 020819B site and the GRB 051022 
host, respectively. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Observations, data reduction and results. We conducted ALMA band-6 obser- 
vations of the GRB 020819B host on 2012 November 17 with 27 antennas and the 
GRB 051022 host on 2012 November 21 and December 2 with 24 antennas during 
the ALMA cycle-0 session. The range of baseline lengths of the configuration is 
15-402m and 15-382m for the observations of the GRB020819B host and 
the GRB 051022 host, respectively. The maximum recoverable scale (the largest 
angular structure to which a given array is sensitive) for the array configura- 
tions is 10’’, which is large enough to cover the angular scale of the host gal- 
axies. The correlator was used in the frequency domain mode with a bandwidth 
of 1,875 MHz (488.28 kHz X 3,840 channels). Four basebands were used, giving 
a total bandwidth of 7.5GHz. We observed the redshifted CO(3-2) line at 
245.072 GHz for the GRB020819B host and the redshifted CO(4-3) line at 
255.142 GHz for the GRB 051022 host. Uranus was observed as a flux calibrator 
and a quasar J2253 + 161 (3C454.3) was observed for bandpass and phase cali- 
brations. The on-source time is 47 min and 71 min for the GRB 020819B host and 
the GRB 051022 host, respectively. The data were reduced with the Common 
Astronomy Software Applications*' package in a standard manner. The maps were 
processed with the CLEAN algorithm with the Briggs weighting (with robust para- 
meter of 0.5). The final synthesized beam size (FWHM) is ~0.8"’ X 0.7'’ and 
~1.0"’ X 0.7'' for the GRB 020819B host and the GRB 051022 host, respectively. 
1.2-mm continuum maps were created with a total bandwidth of ~7.5 GHz, 
excluding channels with emission line. CO emission and 1.2-mm continuum 
emission are detected at both GRB host galaxies (Figs la, b, d and e, and 2). The 
GRB 020819B host is spatially resolved in the observations, while the GRB 051022 
host is not. The velocity-integrated CO intensity is Scoc3_2) = 0.53 + 0.04 Jy km s } 
and Scov4-3) = 0.19 + 0.03 Jy km s | at the nucleus of the GRB020819B host 
and the GRB051022 host, respectively. The 1.2-mm continuum flux density 
is S}2mm = 0.14 + 0.03 mJy and S;.2 mm = 0.10 + 0.03 mJy at the explosion site of 
GRB 020819B and the GRB 051022 host, respectively. 

Molecular gas mass. CO luminosity is derived from L’co = 3.25 X 10’Sco 
Avvops “D7 + 2)? (ref. 23), where L’co is in units of Kkms™! pc”, ScoAv is 
the velocity-integrated flux in Jykms~’, vops is the observed line frequency in 
GHz, and D, is the luminosity distance in Mpc. We assume a CO line ratio of 
CO(3-2)/CO(1-0) = 0.93 and CO(4-3)/CO(1-0) = 0.85, which are the values 
for the local star-forming galaxy M82 (ref. 32), by considering the star-forming 
property of the host galaxies. The derived CO(1-0) luminosities are (5.5 + 
0.4) X 10° K kms" ' pe” and (4.9 + 0.9) X 10? Kkms ' pc’ for the nuclear region 
of the GRB 020819B host and the GRB 051022 host, respectively. The molecular 
gas mass is derived from Mgas = %coL’ co(1-0), Where aco is the CO-to-molecular- 
gas mass conversion factor in units of Mo(Kkm s | pe)! including He mass. It 
is thought that there is a correlation between %co and metallicity in the local 
Universe and at z ~ 1-2 (refs 33, 34); %co decreases with increasing metallicity. 
Because the metallicity of the two hosts is close to the solar metallicity, we adopt 
the Galactic value of aco = 4.3Mo(Kkms | pc’) ~1 (ref, 35). The derived molecu- 
lar gas masses are Mas = (2.4 + 0.2) X 10°Mo and (2.1 + 0.4) X 10°Mq for the 
nuclear region of the GRB 020819B host and the GRB 051022 host, respectively. 
Photometry of Herschel Space Observatory** data. We used the Herschel 
Photodetector Array Camera and Spectrometer (PACS)’’ data in the archive. 
We conducted aperture photometry on the 160-ym image of the GRB 051022 
host with SExtractor* and obtained a flux density of $160 ym = 12mJy (with 
about 30% photometry error). There is no significant contamination from nearby 
sources to the photometry. The FWHM of the source size is ~14'’ at 160 um, 
which is comparable to the FWHM of PACS beam size’. We also measured the 
centroid of the 100-1m emission of the GRB 020819B host and found that the 
emission is in between the galaxy centre and the peak of 1.2-mm continuum. It is 
possible that dust is more widely spread in the host galaxy, although the angular 
resolution is inadequate (FHWM of ~7’’). 

Modified blackbody fit. To estimate a dust temperature (T4,s:) and an emissivity 
index (), we fitted the far-infrared-millimetre photometry data of Herschel at 
100 pm, 160 um and 250 pm (ref. 9), and ALMA at 1.2 mm with a single temper- 
ature modified blackbody form of S, « v** By (exp(hv/kTaust) — 1), where S,, is 
the flux density and v is the frequency (Extended Data Fig. 1). The best-fitting 
results are Tay = 28 +3K and f=1.9+0.3 for the GRB020819B host and 
Taust = 34+ 6K and Bf = 1.8+0.5 for the GRB 051022 host. We note that the 
missing flux in the ALMA observations in the scale of the PACS beam size 


is negligible. The dust temperatures are within the typical range of z~ 0-2 
star-forming galaxies”*°. The dust temperatures of the hosts were derived in a 
previous study with a SED model fit to optical-infrared data including Herschel 
photometry: Taust = 244K and 52.6K for the GRB020819B host and the 
GRB 051022 host, respectively. The dust temperature of the GRB 051022 host is 
higher than in this work. This may be due to their lack of photometric data at 
>160 pm, which is essential to fit dust SED. 

Dust mass, far-infrared luminosity and SFR. By using the best-fitting modified 
blackbody functions, we estimated dust mass, far-infrared luminosity and SFR. 
Dust mass is derived by Maust = SopsD17/[(1 + z)Ka(Vrest)B(Vrest» Taust)] (ref. 41), 
where S,ps is the observed flux density, V,es is the rest frequency, Ka(Vrest) is the 
rest-frequency mass absorption coefficient, B(Vrest: Taust) is the Planck function. 
We assume that the absorption coefficient varies as Kq(v) « v and Kq(125 um) = 
26.4 cm? g | (ref. 42). The derived dust mass is (4.8 + 1.0) X 10’Mo and (2.9 + 
0.9) X 10’M for the GRB 020819B site and the GRB 051022 host, respectively. 
If we use the dust temperature of 52.6 K for the GRB 051022 host estimated in 
a previous work’, the derived dust mass would be about a factor of two lower, 
which has no effect on the discussion in the main text. Far-infrared luminosity 
Lyyp is derived from Lp = AnMaustlo Kav) BC, Taust)dv (ref. 41). The derived 
far-infrared luminosity is (1.1 + 0.2) X 10''Lo and (1.9 + 0.6) X 10'!Lo for the 
GRB 020819B site and the GRB 051022 host, respectively. SFR is derived from the 
far-infrared luminosity as follows: SFR (in Moyr') =1.72X 10 Lap (in Lo) 
(ref. 43), and calculated to be 18+4Moa yc and 32+10Ma yr! for the 
GRB 020819B site and the GRB 051022 host, respectively. The comparison of CO 
and far-infrared luminosities is shown in Extended Data Fig. 2. 
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Extended Data Figure 1 | Spectral energy distribution of the GRB 020819B 
host and the GRB 051022 host. The red squares show ALMA 1.2-mm data. 
Black squares represent photometry from the literature’®"*'*?****° and the 

publicly archived data of Herschel. Dashed curves show the best-fit modified 
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blackbody functions. The arrows represent 3 upper limits. For comparison, we 
plot SED models of Arp220, M82, NGC6946 and M51 (ref. 47). The SED 
models are scaled to the flux density of ALMA data. 
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Extended Data Figure 2 | Comparison of CO and far-infrared luminosities. 
The GRB 020819B host and the GRB 051022 host are plotted with lo 
uncertainties (red and blue squares). To examine the properties of the GRB host 
galaxies as a whole and to compare with previous studies, we plot our data 
without separating the nuclear region and the explosion site for the 

GRB 020819B host galaxy. Various galaxy populations are also plotted: local 
spirals*®“* (circles), local luminous infrared galaxies (LIRGs) (plus symbols) 
and ultraluminous infrared galaxies (ULIRGs)**"*8 (crosses), z= 0.2-1 
ULIRGs*”* (diamonds), z ~ 1-2 normal star-forming galaxies”! (pentagons), 
submillimetre-luminous galaxies’*”* (up-triangles), QSOs and radio galaxies” 
(down-triangles). The grey solid and dashed lines represent the sequence of 
normal star-forming galaxies and starburst galaxies, respectively”’. 
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Nanotwinned diamond with unprecedented 


hardness and stability 


Quan Huang'*, Dongli Yu'*, Bo Xu'*, Wentao Hu'*, Yanming Ma’, Yanbin Wang®, Zhisheng Zhao', Bin Wen", Julong He’, 


Zhongyuan Liu' & Yongjun Tian’ 


Although diamond is the hardest material for cutting tools, poor 
thermal stability has limited its applications, especially at high tem- 
peratures. Simultaneous improvement of the hardness and thermal 
stability of diamond has long been desirable. According to the Hall— 
Petch effect'”, the hardness of diamond can be enhanced by nanostruc- 
turing (by means of nanograined and nanotwinned microstructures), 
as shown in previous studies*’. However, for well-sintered nano- 
grained diamonds, the grain sizes are technically limited to 10—30 nm 
(ref. 3), with degraded thermal stability* compared with that of natural 
diamond. Recent success in synthesizing nanotwinned cubic boron 
nitride (nt-cBN) with a twin thickness down to ~3.8 nm makes it 
feasible to simultaneously achieve smaller nanosize, ultrahardness 
and superior thermal stability’. At present, nanotwinned diamond 
(nt-diamond) has not been fabricated successfully through direct con- 
versions of various carbon precursors” (such as graphite, amorphous 
carbon, glassy carbon and Co). Here we report the direct synthesis 
of nt-diamond with an average twin thickness of ~5 nm, using a 
precursor of onion carbon nanoparticles at high pressure and high 
temperature, and the observation of a new monoclinic crystalline 
form of diamond coexisting with nt-diamond. The pure synthetic 
bulk nt-diamond material shows unprecedented hardness and ther- 
mal stability, with Vickers hardness up to ~200 GPa and an in-air 
oxidization temperature more than 200 °C higher than that of nat- 
ural diamond. The creation of nanotwinned microstructures offers a 
general pathway for manufacturing new advanced carbon-based ma- 
terials with exceptional thermal stability and mechanical properties. 

Diamond is the hardest, stiffest and least compressible crystalline 
material with exceptionally high thermal conductivity. Tools made of 
diamond are widely used for cutting and shaping hard substances such 
as stones, glasses and ceramics. However, diamond is energetically unsta- 
ble relative to graphite under ambient conditions, with an inherent draw- 
back of poor thermal stability. In air, the onset oxidation temperature 
is ~800 °C for natural diamond*”, resulting in the severe wear of dia- 
mond tools at high temperatures. 

The synthesis of materials harder than natural diamond has long been 
sought”®. The Hall—Petch relation’? offers a general pathway to enhan- 
cing hardness by decreasing characteristic size of microstructures (for 
example grain size or twin thickness). Nanograined diamond has been 
successfully synthesized through direct conversions of certain carbon 
precursors at high pressure and high temperature (HPHT)’*”. The pres- 
sure and temperature conditions’ needed to synthesize nanograined dia- 
monds are much higher than those for growing single-crystal diamonds 
in the industry. High pressure is necessary to control grain size effec- 
tively by suppressing atomic diffusion, which promotes growth. Nano- 
grained diamonds synthesized from pure graphite at 2,300—2,500 °C 
and 12—25 GPa reach a grain size of 10-30 nm, with a high Knoop 
hardness of 110-140 GPa (ref. 3) but a reduced onset oxidation tem- 
perature of ~680 °C in air*. At lower temperatures (~ 1,800 °C), nano- 
grained diamonds with a smaller grain size (5—10nm) have been 


synthesized from Cgo, amorphous carbon and glassy carbon, but Knoop 
hardness decreases significantly to 70—86 GPa (ref. 6). The observed 
hardness deficiency seems to originate from intergranular fracturing 
along poorly sintered grain boundaries, rather than the reverse Hall— 
Petch effect resulting from grain-boundary sliding®. Technically, the 
synthesis of well-sintered nanograined diamond while maintaining a 
smaller grain size remains a challenge. 

Nanotwinning is an effective mechanism for acquiring a smaller char- 
acteristic size of microstructure, because twin boundaries possess lower 
excess energy than grain boundaries. It has been verified experimentally 
that, at nanoscale, twin boundaries show a hardening effect identical to 
those of grain boundaries for metals’’’*. Ubiquitously nanotwinned struc- 
tures have been introduced into superhard materials through the success- 
ful synthesis of nt-cBN with an average twin thickness of ~3.8 nm at 
HPHT?. These nt-cBN bulk samples have a superior combination of 
high hardness, high toughness and high thermal stability’. The syn- 
thesis of nt-diamond has not yet been reported but is highly desirable 
in view of the excellent performance of nt-cBN. 

Experience in the synthesis of nt-cBN through an onion-like BN pre- 
cursor suggested the use of onion carbon as precursor in the fabrication 
of nt-diamond. Onion carbon, a high-energy metastable carbon consist- 
ing of concentric graphite-like shells (Extended Data Fig. 1), is structur- 
ally similar to onion-like BN and can be produced in large amounts’’. A 
high concentration of puckered layers and stacking faults in onion car- 
bon may provide the key for the nucleation of nt-diamond at HPHT, as 
for nt-cBN°. In fact, isolated onion carbon particles have been observed 
to convert into diamond nanocrystals under intense electron irradiation 
even at ambient pressure”. 

The onion carbon nanoparticles (~20—50 nm in diameter) used in 
our study were characterized by transmission electron microscopy (TEM) 
to contain numerous puckering and stacking faults (Fig. la). X-ray dif- 
fraction (XRD) characterization of the onion precursors and recovered 
samples after HPHT treatments is presented in Extended Data Figs 2 
and 3. The inter-shell spacings of untreated onion carbon were centred 
on 0.3485 nm. When treated below 10 GPa and 2,000 °C, onion carbon 
retained the original nested crystal structure. Samples recovered from 
10—15 GPa and 1,400-1,850 °C were black and opaque (Fig. 1b inset), 
and contained cubic diamond and an unidentified carbon phase. This 
latter phase has not been observed before and seems to be inherently 
related to the specific structural transformation of onion carbon precur- 
sors at HPHT. Transparent samples were recovered from 18—25 GPa 
and 1,850—2,000 °C, with pure cubic diamond as indicated by the XRD 
patterns. The synthetic temperature of cubic diamond from onion car- 
bon was ~450 °C lower than that from graphite**, allowing easier indus- 
trial fabrication. 

Typical TEM and high-resolution TEM (HRTEM) images ofa black 
opaque sample (synthesized at 10 GPa and 1,850°C) are shown in 
Fig. 1b, c and Extended Data Fig. 4a, b. Cubic diamond was the dom- 
inant phase, with lamellar {111} nanotwins. The new secondary carbon 
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100 nm 


Figure 1 | Onion carbon nanoparticles and a bulk sample synthesized at 
10 GPa and 1,850 °C. a, HRTEM image of onion carbon nanoparticles. 

b, TEM image of the sample showing nanotwinned microstructure. Inset: 
photograph of the black opaque sample (~2 mm in diameter). c, HRTEM 
image of the area marked with the red box in b. Two adjacent cubic diamond 


phase was clearly seen with HRTEM. The d spacings deduced from 
selected-area electron diffraction (SAED) patterns (Extended Data Fig. 4d-f) 
and XRD data (Extended Data Table 1) of this new phase do not match 
any reported carbon phase. The new phase (denoted M-diamond) had 
a monoclinic structure with lattice parameters of a = 0.436 nm, b = 
0.251 nm, c = 1.248 nm and f = 90.9°. All the C—C bonds were sp” 
hybridized, as indicated by the electron energy loss spectrum measure- 
ments (Extended Data Fig. 4c), similar to those in cubic diamond. In 
the TEM images, thin, elongated (and occasionally polygonal) M- 
diamond domains intersected adjacent nanotwinned cubic diamond 
(C-diamond) domains, forming coherent boundaries parallel to the dia- 
mond (111) planes. The orientation relations between M-diamond (M) 
and C-diamond (C) as determined from SAED were M(001)//C(111) 
and M[010]//C[011] (Extended Data Fig. 4d-f). 

The HRTEM images ofa transparent pure nt-diamond sample (syn- 
thesized at 20 GPa and 2,000 °C; Fig. 2a inset) revealed that C-diamond 
contained a high density of lamellar {111} nanotwins (Fig. 2a, b). Unlike 
nt-cBN, in which individual nanograins can be clearly characterized’, 
high-angle grain boundaries in nt-diamond (Fig. 2b) were frequently 
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(C) domains form a {111} twin boundary (TB). Several M-diamond (M) 
domains are associated with cubic diamond twins containing stacking faults 
(SFs). Fast Fourier transforms of M-diamond and cubic diamond, shown in the 
upper and lower insets, respectively, indicate that lattices of M-diamond and 
cubic diamond are coherent. 


interrupted by interlocked areas where adjacent nanocrystals inter- 
sected and merged, making it difficult to determine individual nano- 
grains unambiguously. The nanotwins were predominantly thinner 
than 10 nm. Figure 2c shows a twin thickness distribution derived from 
444 nanotwins on the basis of HRTEM measurements. The average 
thickness, ~5 nm, is the smallest microstructural size so far achieved in 
diamonds. In our transparent nt-diamond samples, stacking faults were 
also observed in nanotwins (Fig. 2b and Extended Data Fig. 5). These 
stacking faults, due to extensive twinning, altered the stacking sequence 
of (111) planes in diamond” and produced weak shoulders of the strong 
(111) reflection in the XRD patterns (Extended Data Fig. 2). These ob- 
served planar faults together with the secondary phase of M-diamond 
also caused the asymmetries in both the (111) and the (220) peaks of 
diamond (Extended Data Figs 2 and 3). 

A hardness value should be determined by the asymptotic region of 
the hardness-load curve'*’’. We found that our samples reached asymp- 
totic hardness at a load of 4.9 N (Fig. 3a). Vickers and Knoop hardnesses 
measured at 4.9 N for six different transparent pure nt-diamond sam- 
ples (Fig. 3b and Extended Data Table 2) showed unprecedentedly high 


Interlocked 


Figure 2 | A nt-diamond bulk sample synthesized at 20 GPa and 2,000 °C. 
a, TEM image of nanotwinned microstructure. Inset: photograph of the 
transparent sample (~1 mm in diameter). b, HRTEM image of intersecting 
nanotwins (marked with the red box in a), viewed along the [101] zone axis of 
diamond. Lamellar {111} nanotwins, stacking faults and residual M-diamond 
(trace) are present. Twin boundaries are marked with red arrows. Grain 
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boundaries (GB) are interrupted by interlocked twins. Inset: SAED pattern 
corresponding to the central area of a. The four-fold-like pattern is from the 
twin domains with four different orientations. c, Thickness distribution of the 
nanotwins measured from HRTEM images. The average twin thickness is 
~5nm. 
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values: 175—203 and 168— 196 GPa, respectively. Two high loads of 9.8 
and 19.6 N were applied to create cracks for fracture toughness deter- 
mination. The determined fracture toughness values ranged from 9.7 
to 14.8 MPa m®°° (Fig. 3c and Extended Data Table 2). Meaningful in- 
dentation hardness can be measured reliably as longas the shear strength 
of the sample is smaller than the compressive strength of the diamond 
indenter’®; this requirement was satisfied because no visible plastic de- 
formation of indenter diamond tip was observed after measurements 
of hardness and fracture toughness (Extended Data Fig. 6). Both the 
achieved hardness and the trade-off between hardness and toughness 
of our nt-diamond samples are significantly superior to those of other 
popular tool materials, such as cobalt-bonded tungsten carbide (Co-WC)” 
and previously reported diamond-related materials’**" (Fig. 3b, d), yield- 
ing diamonds with unsurpassed mechanical properties. The simultaneous 
improvement in hardness and fracture toughness in our nt-diamond 
is intimately related to the ubiquitous nanotwinning microstructure. 
The presence of ultrafine nanotwins introduces extra hardening, which 
is probably due to both the Hall-Petch and quantum confinement ef- 
fects at nanoscale”, while gliding of dislocations along densely distributed 
twin boundaries enhances fracture toughness”. Our results demonstrate 
that the old paradigm—the higher the hardness of a material, the lower 


H, (GPa) 


K,, (MPa m°-) 


Figure 3 | Typical mechanical properties of nt-diamond and its comparison 
with other tool materials. a, Hy of nt-diamond and natural diamond crystal as 
a function of applied load (F). Beyond 4.9 N, Hy decreases to the asymptotic 
values of ~200 GPa for nt-diamond (red line). For natural diamond crystals, 
our measured Hy values are ~110 GPa on the {110} face (blue line) and 

~62 GPa on the {111} face (pink line). Error bars indicate 1 s.d. (m = 5). Inset: 
plot of Hx against F for nt-diamond. b, c, Hy (b) and K;, (c) for different 
tool materials, including nt-diamond (nt-D), nanograined diamond (ng-D; 
grain size 10—30 nm)’, single-crystal diamond (SC-D)’*, cobalt-bonded 
polycrystalline diamond (Co-PCD)”’ and Co-WC”. d, Plot of Hy against K,, 
for nt-diamond in comparison with available data on other forms of diamond. 
The data for nt-diamond are shown as solid red circles above the shaded 
envelope. The published data are from representative diamond materials, 
including type Ia natural SC-D (open upward triangles'*), IIa natural SC-D 
(open squares'*), HPHT-grown SC-D (open downward triangles'*), CVD- 
grown SC-D (open hexagons"), annealed Ila natural SC-D (filled squares’*), 
CVD-grown SC-D annealed at HPHT (filled hexagons"*), Co-PCD (large grey 
circle’), CVD-grown PCD (large pink oval”) and aggregated diamond rod 
(Knoop hardness, filled upward triangle’'). The hardness of nanograined 
diamond reaches 110—140 GPa (ref. 3), but no fracture toughness data 

were reported. Those data are therefore not included in the figure. 
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the fracture toughness—can be broken through processes of controlled 
nanotwinning in covalent materials. 

The thermal stability of different pure nt-diamond samples was char- 
acterized by thermogravimetry curves measured in air. Ata heating rate 
of 5°C min 1, the onset oxidation temperatures of nt-diamond and 
natural diamond were ~980 and ~770 °C (Fig. 4a), respectively. Ex- 
tended Data Fig. 7 compares the thermal stability of nt-diamond with 
other tool materials measured at a heating rate of 10°C min '. The 
onset oxidation temperature of nt-diamond (~ 1,056 °C) was again much 
higher than those of natural diamond (~805 °C), synthetic diamond 
powders (~725 °C), nanograined diamond (~680 °C)* and Co-WC 
(800— 1,000 °C)’ (Fig. 4b), and even rivalled that ofng-cBN (~1,187 °C)*. 
The oxidation of diamond generally has two simultaneous processes’, 
namely the oxidation of graphitized diamond and the oxidation of dia- 
mond itself. Previous experiments have shown that the oxidation tem- 
perature of graphite in air is ~50°C lower than that of diamond”’. 
According to the size-dependent pressure—temperature phase diagram 
derived from nanothermodynamic theory”, diamond becomes energet- 
ically stable over graphite at deep nanometre scale (~5 nm). This would 
certainly delay the graphitization of nt-diamond and would result in a 
higher oxidation temperature. Moreover, compressive stress introduces 
additional resistance to the oxidation of diamond. Given that the inter- 
nal stress induced by nanotwinning boundaries increases with reduced 
twin thickness”, the oxidation process of nt-diamond may be retarded 
because of the presence of ultrafine nanotwins. Differential scanning 
calorimetry (DSC) measurements provided further evidence that thin- 
ner nanotwins result in an even higher oxidation temperature of ~1,300 °C 
(Extended Data Fig. 7a), consistent with the aforementioned specu- 
lation. Thus, both mechanical properties and thermal stability depend 
primarily on the achieved average twin thickness. 

The successful syntheses of nt-diamond and nt-cBN show that nano- 
twinning microstructure is an effective route for simultaneously enhan- 
cing the hardness, fracture toughness and thermal stability of superhard 
materials. Our experimental results on nt-diamond further confirm 
that there is continuous hardening at nanotwinning sizes down to ~5 nm, 
which agrees with previous results on nt-cBN® but is in stark contrast with 
the sharp softening of metals at these nanometre scales. We therefore 
predict that pursuing microstructure with thinner nanotwin sizes may 
lead to findings of covalent materials with even superior properties. 
Here it may be instructive to estimate the lower limit of nanotwin 
thickness and the corresponding ultimately achievable hardness (Hya) 
of diamonds. If we take {111} twins in nt-diamond as the model sys- 
tem, the estimated minimal twin thickness, 1,,j,, is 3d,,; = 0.618 nm 


ait 
jo} 
(=) 


a 
oO 


—nt-D 
~ Natural diamond 


oO Oo 


ae 
Q © 


600 


Ao} 
e 
= 
o 

& 

£ 
ie} 
x 

wu 


Heat flow (W g"') TG (mass%) » 


24500 400 600 800 1,000 1,200 


T(C) 


Figure 4 | Typical thermal stability of a nt-diamond sample. a, Comparison 
of the onset oxidation temperatures of a nt-diamond bulk sample (red) with a 
natural diamond crystal (cyan). Both thermogravimetry (TG; top) and DSC 
(bottom) curves were measured in air at a heating rate of 5 °C min |. The onset 
oxidation temperature of the nt-diamond (980 °C from thermogravimetry or 
960°C from DSC) was more than 200 °C higher than that of the natural 
diamond (770 °C from thermogravimetry or 720 °C from DSC). b, Comparison 
of working temperatures (T,) in air of nt-diamond with other tool materials, 
including ng-D*, SC-D*', Co-PCD? and Co-WC””. 
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because the atomic stacking sequence along the (111) direction of dia- 
mond (lattice parameter a = 0.3568nm) is ...ABCABC...(Extended 
Data Fig. 8). Assuming that the Hall—Petch effect is no longer applicable 
at such a scale*’’, H,, for nt-diamond is estimated with the fol- 
lowing formula according to our hardness model*”*: Hya = Ho + kge/ 
Amin Where Hy is the hardness of single-crystal diamond (~90 GPa) 
andk,. = 211N, 1/3 = 187.7 GPa nm is the quantum confinement hard- 
ening coefficient for a covalent crystal’*, which is proportional to the 
valence electron density N, (0.705, ref. 29). Thus, Hy, for nt-diamond 
is 394 GPa. This presents a technical challenge to synthesize nanotwinned 
microstructures with the required twin thickness to achieve such an 
exceptional hardness property. 

Finally, the experimental HPHT conditions for synthesizing nt- 
diamond and nt-cBN are essentially identical. This opens up the possi- 
bility of manufacturing nt-diamond/nt-cBN composites. Such nanotwinned. 
composites are expected to possess intermediate oxidation temper- 
ature and hardness between those of nt-diamond and nt-cBN but with 
greater fracture toughness as a result of the combined contributions 
from nanotwinning and composite effects. 


METHODS SUMMARY 


We fabricated onion carbon particles with diameters of ~20—50 nm by using black 
carbon powders through an impinging-streams technology*®. HPHT experiments 
were performed with a 10-MN double-stage large-volume multi-anvil system with 
the standard COMPRES 10/5 sample assembly consisting ofa 10-mm spinel (MgAl,O.) 
+ MgO octahedron with a Re heater anda LaCrO, thermal insulator. Temperature 
was measured with type C W-Re thermocouples, and pressure was estimated from 
previously obtained calibration curves at different temperatures for the multi-anvil 
apparatus’. Recovered samples were ~1—2 mm in diameter and 0.2—0.5 mm in 
height. Microstructures were investigated with a transmission electron microscope 
(JEM-2010) with an accelerating voltage of 200 kV. Component phases were iden- 
tified by TEM and XRD (Cu K,; D8 Discover). A microhardness tester (KB 5 BVZ) 
was used to measure Hy and K,, with a diamond Vickers indenter as well as Hy with 
a diamond Knoop indenter. Hy was determined from Hy = 1,854.4F/L*, where F 
(in newtons) is the applied load and L, (in micrometres) is the arithmetic mean of 
the two diagonals of the Vickers indentation. Hx was determined from Hx = 

14,228.9F/L5, where L, (in micrometres) is the longer diagonal of the Knoop in- 
dentation. Five hardness data points were obtained at each load, and the hardness 
values were determined from the asymptotic-hardness region. Kj, was calculated 
from K,, = 0.016(E/Hy)°°F/C'* for radial cracks formed in the bulk nt-diamond 
sample’®, where C (in micrometres) is the average length of the radial cracks mea- 
sured from the indent centre, and E = 1,000 GPa is Young’s modulus of diamond’*. 
The presented Kj, values were averaged over three data points determined at loads 
of 9.8 and 19.6 N. Oxidation resistance was studied by measuring thermogravime- 
try and DSC curves in air, using NETZSCH STA 449 C over the temperature range 
20—1,500 °C. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Extended Data Figure 1 | Schematic icosahedral model of a ten-shell onion 
carbon. The icosahedral-quasicrystal-like model of an onion carbon particle 
was relaxed from a nested buckyonion of Ceo, C405 Csa05 Co¢605 Ci 500 Ca,160 
Cy940> C3840» Ca,g60 and Co,o99. This model was constructed with the same 
classical molecular dynamics technique as that used in our previous work’. The 
spacings between adjacent shells in the model vary from ~0.300 nm to 
~0.340 nm. 
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Extended Data Figure 2 | Phase transformation of onion carbon compacts 
at HPHT. XRD patterns of onion carbon precursor (Raw) and seven samples 
recovered from different conditions indicated by P (in GPa)-T (in °C) pairs. 
The inter-shell spacing of the starting onion carbon nanoparticles is 

~0.3485 nm. For the two samples recovered from 8 GPa/2,000 °C and 15 GPa/ 
1,200 °C, the onion carbon structure does not show significant alteration except 
that the inter-shell spacing decreases to 0.3305 and 0.3361 nm, respectively. 
Cubic diamond appears when the applied pressure is more than 10 GPa and 
temperature is more than 1,400 °C, with an accompanying new carbon phase 
recognized in the black opaque samples synthesized at 1,850 °C or below. A 
small amount of residual onion carbon can be detected in the sample recovered 
from 15 GPa/1,400 °C. At pressures of 18-25 GPa and temperatures of 1,850- 
2,000 °C, the recovered samples changed from translucent to transparent, 
and only the diffraction peaks of cubic diamond can be seen in XRD patterns. 
Weak shoulders of the (111) peaks of diamond (red arrows) appear in three 
samples synthesized at pressures of 18—20 GPa and temperatures of 
1,850—1,950 °C. Asymmetry in the (111) and (220) peaks of diamond was 
often observed in the samples synthesized at pressures below 20 GPa and 
temperatures below 1,950 °C. 
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Extended Data Figure 3 | XRD patterns of a sample recovered from 10 GPa 
and 1,850 °C. All the recorded d spacings of visible diffraction peaks are listed 
in Extended Data Table 1. Insets: two peaks overlapping the cubic diamond 
reflections. Most of these extra reflections can be indexed with a monoclinic 
structure (M-diamond) as shown in Extended Data Table 1. 
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Extended Data Figure 4 | TEM images, electron energy loss spectrum 
(EELS) and SAED measurements on a sample recovered from 10 GPa and 
1,850 °C. a, TEM image showing interlaced twins. b, HRTEM image 
corresponding to the area in the red box in a. A monoclinic M-diamond (M) 
domain is observed between two cubic diamond (C) domains. ¢, EELS spectra 
of M and C phases. All the C-C bonds are sp* hybridized in both M and C 
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phases. d—f, SAED patterns along the [010], [150] and [130] zone axes of M, 
respectively, recorded by rotating an M crystal. (111) and (200) spots of the 
twinned C phase, overlapping with some spots of the M phase as a result of 
coherent growth, are marked by red circles and boxes, respectively. The 
determined orientation relations between M and C phases are M(001)//C(111) 
and M[010]//C[011]. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 5 | HRTEM observations of three nt-diamond bulk 
samples synthesized at different HPHT conditions. a-c, HRTEM and 
corresponding TEM (inset) images of three representative samples, O-366 
(a), P-368 (b) and M-363 (c) as listed in Extended Data Table 2. TBs are marked 
with red arrows. The measured average twin thicknesses are ~5.2 nm for 
sample P-368, ~5.4nm for sample 0-366 and ~7.9 nm for sample M-363; the 
smaller the average twin thickness, the higher the hardness. The full width at 
half-maximum (FWHM) of the (111) peak is mainly related to the nanograin 
size: samples O-366 and P-368 have a larger FWHM as a result of their smaller 
nanograin size. Both pressure and temperature can promote the phase 


transformation of onion carbon to diamond. The probability of stacking faults 
and the volume fraction of M-diamond decrease with elevated synthesis 
temperature and pressure, as confirmed by our HRTEM observation. The 
abundant stacking faults in the nanotwins result in the appearance of a shoulder 
near the (111) peak (Extended Data Fig. 2), for example in the XRD pattern of 
sample O-366. The asymmetries of the (111) and (220) peaks of diamond 
shown in Extended Data Fig. 2 can be attributed to planar faults and the 
secondary phase in microstructure. On the one hand, a twin fault can itself 
produce peak asymmetry; on the other, M-diamond also contributes to peak 
asymmetry because of peak overlap, as demonstrated in Extended Data Fig. 3. 
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Extended Data Figure 6 | Comparison of Vickers indenter tip before and 
after hardness and fracture toughness tests of nt-diamond. a, b, Scanning 
electron microscopy images of the square pyramid diamond tip before (a) and 
after (b) the tests of nt-diamond. A load of 9.8 N was used during the hardness 
and toughness tests. As shown in b, the indenter, with a dark imprint of 
~6.9 um X ~6.9 um on the tip matching the permanent indentation on the 
tested nt-diamond, shows no visible plastic deformation. c, d, Photographs of 
indentations on the standard calibration block equipped by microhardness 
tester KB 5 BVZ. The indentations were formed at a load of 1.96 N before 

(c) and after (d) the tests, with the same tip as shown in a andb. The indenter tip 
produced an almost identical indentation (or standard hardness value) on the 
calibration block after the nt-diamond tests. These calibration results ensured 
the accuracy, repeatability and reliability of the unprecedented hardness and 
exceptional toughness values of nt-diamond reported in the present study. 
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Extended Data Figure 7 | Comparison of in-air oxidation resistance of bulk 
nt-diamond with other diamonds measured at a heating rate of 

10°C min“*. a, Comparison of the onset oxidation temperatures determined 
from measured thermogravimetry curves. The onset temperature was 

~1,056 °C for a bulk nt-diamond, ~805 °C for a natural diamond crystal, 
~725 °C for synthetic diamond powders and ~680 °C for a nanograined 
diamond’. b, Comparison of the onset oxidation temperatures determined 
from the exothermic trough in the measured heat flow curves of DSC. The onset 
temperature was ~ 1,035 °C for the nt-diamond, ~750 °C for the natural 
diamond and ~705 °C for the synthetic diamond. The exothermic peaks 
located at 1,280 °C and 1,320 °C for the nt-diamond were probably due to 
the presence of finer nanotwins. The above-measured oxidation temperatures 
are consistent with those determined from the corresponding 
thermogravimetry curves. 
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Extended Data Figure 8 | Atomic arrangements of a {111} )> =3 twin 
boundary in cubic diamond. The twin boundary is projected along the (011) 
direction. Because of the stacking sequence of ABC for diamond structure, the 
minimum twin thickness is 3d,,;, where dj, is the planar distance along the 
direction of (111) in the unit cell of cubic diamond. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 1 | Comparison of d spacings (d,,,) observed from XRD and SAED with those of proposed M-diamond structure and 
cubic diamond 


Cubic diamond 


Sample M-diamond JCPDS6-675 


obs (nM) Ih obs (nM) 
(XRD) (SAED) 
0.625 0.6239 002 
0.3217 0.25 0.313 0.3120 004 
0.2672* 1.05 
0.2239* 0.78 
0.2182 0.50 0.218 0.2179 200 
0.2176 0.40 0.212 0.2142 1-11 
0.2087 1.56 0.209 0.2080 006 
0.2062 100 0.206 0.2068 20-2 0.206 111 100 
0.205 0.2048 202 
0.180 0.1800 20-4 


dea(nm) hkl | d(nm) Akl Ih 


0.1707 0.92 0.1774 204 
0.1671 0.52 0.1647 11-5 
0.1515 0.27 0.154 0.1517 20-6 
0.1481 0.28 0.1493 206 
0.1279 0.53 0.128 0.1278 20-8 
0.1263 23.34 0.1261 220 25 
0.1260 2.58 0.126 0.1259 208 


0.126 0.1258 3-10 
0.110 0.1090 400 
0.1090 0.25 0.109 0.1090 20-10 
0.108 0.1083 3-1-6 
0.1077 9.36 0.108 0.1076 2010 
0.1074 2.49 0.108 0.1076 40-2 |0.10754 311 16 
0.107 0.1071 402 
0.107 0.1070 3-16 
0.103 0.1034 40-4 
0.102 0.1024 404 
0.097 0.0972 40-6 
0.095 0.0961 406 
0.091 0.0900 40-8 
0.089 0.0887 408 |0.08916 400 8 
0.082 0.0818 5-1-3 |0.08182 331 16 
0.081 0.0807. 3-112 
0.081 0.0806 5-13 
0.080 0.0796 3-1-12 
0.075 0.0744 5-1-7 
0.07281 422 
0.06864 511 


The sample was synthesized at 10 GPa and 1,850 °C. The d,ai values were calculated with the monoclinic structural parameters a = 0.436 nm, b = 0.251 nm, c = 1.248nm and f = 90.9". Asterisks indicate 
unknown peaks. 
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Extended Data Table 2 | Vickers hardness Hy (GPa), Knoop hardness Hy (GPa) and fracture toughness K,, (MPa m°>) for six transparent pure 
(XRD standard) nt-diamond bulk samples 


ee 
W-377 25/2,000 0.45 193.6421.7 182.14+6.4 12:130.5. 
M-363 25/2,000 0.43 175.44+18.2 168.2+6.8 14.843.6 
0-366 18/1,850 0.98 191.8+9.3 190.1+14.7 13.7+4.2 
V-376 20/2,000 0.46 203.64+12.0 196.6+8.2 9.7426 
P-368 20/1,850 1.04 198.7432.5 190.9+3.2 10.6+1.1 
S-379 20/1,850 0.43 191.74+25.6 183.646.8 11.8+0.5 


Hy and Hx values were measured at a fixed load of 4.9 N. The Ki values were measured at loads of 9.8 and 19.6 N. Error bars indicate 1 s.d.(n = 5 for Hy and Hx, and n = 3 for Ki). The FWHMs of (111) peaks in the 


XRD patterns of nt-diamond samples are also listed. 
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Increased frequency of extreme Indian Ocean Dipole 
events due to greenhouse warming 


Wenju Cai”, Agus Santoso®, Guojian Wang”, Evan Weller, Lixin Wu’, Karumuri Ashok*, Yukio Masumoto” & Toshio Yamagata’ 


The Indian Ocean dipole is a prominent mode of coupled ocean- 
atmosphere variability’ “*, affecting the lives of millions of people in 
Indian Ocean rim countries* »’. In its positive phase, sea surface tem- 
peratures are lower than normal off the Sumatra-Java coast, but higher 
in the western tropical Indian Ocean. During the extreme positive- 
IOD (pIOD) events of 1961, 1994 and 1997, the eastern cooling 
strengthened and extended westward along the equatorial Indian 
Ocean through strong reversal of both the mean westerly winds and 
the associated eastward-flowing upper ocean currents’. This cre- 
ated anomalously dry conditions from the eastern to the central Indian 
Ocean along the Equator and atmospheric convergence farther west, 
leading to catastrophic floods in eastern tropical African countries'’** 
but devastating droughts in eastern Indian Ocean rim countries*7"*””. 
Despite these serious consequences, the response of pIOD events to 
greenhouse warming is unknown. Here, using an ensemble of cli- 
mate models forced by a scenario of high greenhouse gas emissions 
(Representative Concentration Pathway 8.5), we project that the fre- 
quency of extreme pIOD events will increase by almost a factor of 
three, from one event every 17.3 years over the twentieth century to 
one event every 6.3 years over the twenty-first century. We find that 
a mean state change—with weakening of both equatorial westerly winds 
and eastward oceanic currents in association with a faster warming 
in the western than the eastern equatorial Indian Ocean—facilitates 
more frequent occurrences of wind and oceanic current reversal. This 
leads to more frequent extreme pIOD events, suggesting an increas- 
ing frequency of extreme climate and weather events in regions affec- 
ted by the pIOD. 

In austral winter and spring, southeasterly trade winds that feed the 
tropical convergence zone near the maritime continent are a feature of the 
southern tropical Indian Ocean. During a pIOD event, an initial cooling off 
Sumatra-Java, the eastern pole of the Indian Ocean dipole, suppresses local 
convection, inducing easterly wind anomalies and a shallowing thermo- 
cline. This promotes upwelling that in turn reinforces the initial cooling'*”’, 
a process referred to as Bjerknes feedback. The growth of cool anomalies 
causes a northwestward extension of the southeasterly trade winds’*”®, with 
anomalous easterlies along the equatorial Indian Ocean (Fig. 1a), where 
weak westerlies normally prevail. The change in wind promotes conver- 
gence, rainfall and warm anomalies in the equatorial western Indian Ocean. 
The altered circulations induce droughts and bushfires in eastern Asia and 
Australia®*, floods in parts of the Indian subcontinent'' and eastern 
Africa!*“, coral reef death across western Sumatra’’, and malaria outbreaks 
in eastern Africa’. During extreme pIOD events, as occurred in 1961, 1994 
and 1997, the anomalies, particularly the anomalous equatorial easterlies, 
are far stronger (Fig. 1b), with commensurately greater impacts. During the 
1997 event, devastating floods in Somalia, Ethiopia, Kenya, Sudan and 
Uganda caused several thousand deaths and displaced hundreds of thou- 
sands of people. In contrast, Indonesia suffered severe droughts and wild- 
fires*’*'” made worse by the developing 1997 El Nifio; the associated smoke 


and haze caused severe health problems to tens of millions of people in 
Indonesia and surrounding countries””®. 

These dramatic impacts call for an urgent investigation into whether 
extreme pIOD events will change in a warmer climate. Recent studies 
have shown that greenhouse warming leads to a mean state change in the 
equatorial Indian Ocean with an easterly wind trend and a faster warming 
rate in the west than in the east, but referenced to the evolving mean state 
there is no detectable change in either the overall frequency or amplitude 
of pIOD events'*. Here, using a suite of distinct process-based indica- 
tors, we show that there is in fact a significant increase in the frequency of 
the extreme pIOD events under greenhouse warming. 

We characterize the observed extreme pIOD events in terms of their 
contrast with moderate events, focusing on austral spring, the season in 
which the IOD usually peaks. During extreme pIOD events, the cooling off 
Sumatra is intensified by the large equatorial easterly anomalies through 
generation of equatorial and coastally trapped upwelling Kelvin waves”””, 
enhanced evaporation’, and a weakening of the mean eastward oceanic 
flows that transport heat eastward towards Sumatra”. The anomalous 
convergence in the west, marked by increased rainfall and temperature, is 
amplified through a series of processes: reduced wind speed and evapora- 
tion associated with the downstream extension of the southeasterly trades; a 
deeper thermocline caused by the weaker eastward ocean heat transport 
along the Equator’; and generation of equatorial downwelling Rossby 
waves”*. The warming in the west and cooling in the east in turn strength- 
ens the equatorial easterly anomalies, introducing a positive feedback 
along the Equator that operates in addition to the Bjerknes feedback 
centred off Sumatra—Java. The equatorial positive feedback, which is 
far stronger during extreme pIOD events, leads to stronger equatorial 
cooling (Extended Data Fig. 1a), and reversal of the equatorial winds and 
ocean currents so that they flow towards the west (Extended Data Fig. 2f). 
This creates a zone of atmospheric subsidence along the Equator char- 
acterized by low rainfall and colder sea surface temperatures (SSTs) that 
extend much farther to the west than during moderate pIOD events 
(Fig. 1; Extended Data Fig. 1a). 

A heat budget analysis for the eastern-to-central equatorial Indian Ocean 
during the IOD developing phase (July to October; Extended Data Figs 1 
and 2 and Methods) clearly indicates the 1961, 1994 and 1997 events to be 
the most extreme pIODs. The growth of equatorial SST anomalies during 
these three events is dominated by nonlinear processes involving zonal 
current anomalies. In particular, the nonlinear zonal advection, that is, 
the product of the anomalous west-minus-east SST gradient with the 
anomalous zonal currents (dark red bar, Extended Data Fig. 1c), sets these 
three events apart from the rest. Essentially, the equatorial positive feedback 
enhances anomalies of westward-flowing equatorial winds and currents, 
allowing for an eventual reversal. This nonlinear process can be parame- 
terized by the product of the equatorial easterly anomalies, which drive 
the current, and the dipole mode index (DMI)', which measures the 
west-minus-east SST gradient (see Methods). Such nonlinearity also occurs 
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Figure 1 | Comparison of moderate and extreme pIOD and identification of 
extreme pIOD events. a, b, September-November average rainfall (shading, in 
units of mm day’) and wind stress (vector scale is shown in the top right 
corner for each panel) anomalies associated with a moderate (1982) and 
extreme (1997) pIOD. ¢, d, Principal variability patterns of rainfall obtained by 
applying a statistical and signal processing method, EOF analysis, to a satellite- 
era rainfall anomaly data from the Global Precipitation Climatology Project 
version 2 (see Methods), in the equatorial region (10° S-10° N, 40° E-100° E). 
The associated rainfall and wind stress vectors from reanalysis data (see 
Methods) are presented as linear regression onto the EOF time series. The 
colour scale indicates rainfall in mm day ' per 1 s.d. change; blue or red 


in the eastern pole, rendering a negative skewness of SST, in that cool 
anomalies off Sumatra grow to greater amplitude than warm anomalies”. 

The strong nonlinearity along the Equator means that the representa- 
tion of extreme pIOD impacts requires more than just the commonly 
used DMI. This along-the-Equator nonlinearity can be represented by 
two modes of empirical orthogonal function (EOF) of rainfall anomalies. 
The pattern of the first EOF (EOF1, 43.4% of the total variance, Fig. 1c) 
shows an east-west dipole of reduced convection, featuring anomalously 
cold SSTs and a shallow thermocline in the east but anomalies of opposite 
polarities in the west (Extended Data Fig. 3). This reflects characteristics 


Second principal component 


Dipole mode index (°C) 


contours indicate increased or decreased rain. Note the different vector scales 
in c and d. e, Relationship between the two principal component time series. 
Values for 1961 are obtained by regressing the rainfall anomaly pattern from a 
reanalysis onto the EOF1 and EOF2 pattern (see Methods). An extreme plOD 
event (red dots) is defined as when the first principal component is greater 
than 1s.d. and the second principal component is greater than 0.5s.d. A 
moderate pIOD event (green dots) is determined from a detrended DMI' 
when its amplitude is greater than 0.75 s.d. other than the 1994 and 1997 
events. Negative IOD and neutral years are indicated with blue dots. 

f, Relationship between the second principal component time series and the 
DMI. 


of plOD events commonly depicted by the DMI. EOF2, which accounts 
for 20.7% of the total variance, on the other hand, reflects pronounced 
anomalous conditions during extreme pIOD events, as described above. 
Both EOFs feature enhanced convection over the western tropical Indian 
Ocean and equatorial Africa (Fig. 1d, Extended Data Fig. 3). 

EOF1 and EOF2 (or the DMI) display a nonlinear relationship (Fig. le, 
f). During a moderate event, the two EOFs are of opposite sign. Thus, the 
associated rainfall anomalies tend to offset over the central Indian 
Ocean. In contrast, during an extreme pIOD, both EOFs are positive, 
rendering large negative rainfall anomalies that extend westward along 
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Figure 2 | Multi-model ensemble average of the 
principal variability patterns of austral spring 
season rainfall and their nonlinear relationship. 
a, b, First and second principal variability patterns 
of rainfall anomalies referenced to the ‘control’ 
period (1900-1999), obtained by applying an EOF 
analysis to rainfall anomalies in the equatorial region 
(10° S-10° N, 40° E-100° E). Note the different 
vector scales in a and b. The associated pattern and 
wind stress vectors beyond the domain are obtained 
by a linear regression onto the EOF time series. The 
colour scale indicates rainfall in mm day * per 1.0 s.d. 
change; blue or red contours indicate increased or 
decreased rainfall. c, d, A nonlinear relationship 
between the principal component time series for the 
‘control’ (1900-1999) and ‘climate change’ (2000- 
2099) periods. An extreme pIOD event (red dots) is 
defined as when the first principal component is 
greater than 1s.d. and the second principal 
component is greater than 0.5 s.d. A moderate plOD 
event (green dots) is determined from a detrended 
DMI when its amplitude is greater than 0.75 s.d. but 
is not an extreme pIOD event. Negative IOD and 
neutral years are indicated with blue dots. The 
number of extreme and moderate pIOD events is 
indicated. 


Figure 3 | Multi-model statistics associated with 
the increase in frequency of extreme plOD 
events. a, Multi-model ensemble histogram of zonal 
wind stress t,, anomalies in the equatorial Indian 
Ocean (5° S-5° N, 60° E-100° E), referenced to the 
‘control’ period. These are averaged over the July- 
October months of Indian Ocean dipole development 
phase. Values during extreme pIOD years in each 
period are separated into 5 X 10 *Nm ” bins 
centred at the tick point for the ‘control’ (blue) and 
‘climate change’ (red) periods. The multi-model 
median for the ‘control’ (dashed blue line) and the 
‘climate change’ (dashed red line) periods are 
indicated. b, The same as a but for the product of ty 
anomalies shown in a multiplied by the DMI' 
(separated into 0.01 Nm °C bins), approximating 
the nonlinear zonal advection (see Methods). c, d, The 
same as a and b but for all years excluding extreme 
plOD events. The histogram for extreme pIOD is 
statistically different above the 95% confidence level 
from that for non-extreme pIOD events, for both the 
‘control’ and the ‘climate change’ periods. On average, 
nonlinear advection is greater for extreme plOD 
events than for non-extreme pIOD events. 
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Figure 4 | Multi-model ensemble average of rainfall anomalies (referenced 
to the ‘control’ period) during extreme pIOD events and associated statistics 
in affected regions. a, b and c, Ensemble average rainfall anomalies in the 
‘control and ‘climate change’ periods, and their difference (‘climate change’ 
minus ‘control’). Stippling in c indicates regions where the differences are 
statistically significant at the 95% level as determined by a two-sided Student’s t- 
test. d, e, Multi-model ensemble histogram of the rainfall anomalies over northern 
equatorial East Africa (0°-5° N, 37.5° E-47° E) for extreme plOD events and 

all events other than extreme pIOD events, respectively. All extreme pIOD events 


the Equator. The pIOD events of 1961, 1994 and 1997 are determined 
to be ‘extreme’ when EOF 1 is greater than a 1-standard-deviation (s.d.) 
value and EOF? is greater than a positive 0.5-s.d. value. The characteristics 
of the pIOD events are only fully captured by the superimposition of these 
two EOFs (Extended Data Fig. 4). Without EOF2, the salient feature of the 
westward extending equatorial anomalies that characterizes extreme plOD 
would be missed. A similar EOF analysis on vertical velocity « at 500 mb, a 
measure of convection, generates similar patterns (Extended Data Fig. 5). 

To assess the influence of greenhouse warming, we use the Coupled 
Model Intercomparison Project phase 5 (CMIP5*°) multi-model database. 
The coupled general circulation models (CGCMs) used in this study are 
forced with historical anthropogenic and natural forcings, and future green- 
house-gas emission scenarios of Representative Concentration Pathway 
(RCP) 8.5, covering the 1900-2099 period. Not all of the 31 CGCMs con- 
sidered here are able to simulate the characteristics of observed pIOD events. 
We focus on 23 CGCMs that simulate negative skewness of SST offSumatra 
as wellas the nonlinear relationship of the two rainfall EOFs (Extended Data 
Table 1, Fig. 2). An identical EOF analysis of « at 500 mb in 21 out of the 23 
selected CGCMs, in which «@ is available, produces similar spatial patterns 
and their nonlinear relationship (Extended Data Fig. 6). From these 23 
CGCMs, we define extreme pIOD events in the same manner as for 
the observed events, and compare their frequency in the first (1900- 
1999) and second (2000-2099) hundred-year periods. These two adja- 
cent periods within a transient scenario are referred to as the ‘control’ 
and ‘climate change’ periods, respectively. 

In aggregation, the frequency of extreme pIOD events based on rainfall 
EOFs increases by a factor of 2.7, from about one event every 17.3 years 
(133 events in 2,300 years) in the ‘control’ period, to one every 6.3 years 
(367 events in 2,300 years) in the ‘climate change’ period (Fig. 2c and d). 
This is statistically significant according to a bootstrap test”, underscored 
by a strong inter-model consensus, with 21 out of 23 models simulating an 
increase (Extended Data Table 1). Sensitivity tests to varying definitions of 
extreme pIOD further support the robustness of this result (Supplementary 
Tables 1 and 2). 


in each period are separated into 0.5mm day * bins centred at the tick point 
for the ‘control’ (blue) and ‘climate change’ (red) periods. The multi-model mean 
for the ‘control’ (blue dashed line) and the ‘climate change’ (red dashed line) 
periods are indicated. In each period the histograms for extreme pIOD (for 
example, red bars in Fig. 4d) and non-extreme pIOD (for example, red bars in 
Fig. 4e) are statistically different above the 95% confidence level. f, g, The same as 
d and e, but for the Java region (8° S-6° S, 105.5° E-108.5° E), separated into 
1mm day’ bins. The two histograms in d and e are statistically different above 
the 95% confidence level, but this is not the case for the two histograms in f and g. 


Development of pIOD events can interact with an El Nifio event’***. 


The 1997 extreme pIOD developed in conjunction with the strongest El 
Nino of the twentieth century. The 1961 and 1994 extreme pIODs on the 
other hand occurred without an El Nifo, supporting the notion that the 
generating mechanism behind an extreme pIOD event lies within the 
Indian Ocean’. We find no evidence that the increase in extreme pIOD 
events in the ‘climate change’ period is induced by a change in the fre- 
quency of El Nino or El Nifio Modoki occurrences (see Methods and Ex- 
tended Data Fig. 7). 

Rather, the increase in extreme pIOD events appears to arise from mean 
state changes within the Indian Ocean (Extended Data Fig. 8), consis- 
tent with a weakening Walker circulation as projected under greenhouse 
warming’. Relative to the ‘control period, the altered mean state is more 
conducive to equatorial easterly winds, westward oceanic currents, an en- 
hanced west-minus-east SST gradient, and the associated nonlinear zonal 
advection. There is a strong link between climatologically stronger easterly 
winds along the Equator and more occurrences of a given nonlinear advec- 
tion (correlation coefficient r= 0.9, not shown). These changes lead to 
increasing occurrences of extreme pIOD events, because a smaller per- 
turbation is required in the ‘climate change’ period to generate the same 
size ofnonlinear zonal advection as seen during extreme pIOD events in the 
‘control’ period (see Extended Data Fig. 9). Thus, there are increased occur- 
rences of extreme pIOD events for a given size of the equatorial easterly 
anomaly (Fig. 3a), or a given strength of nonlinear advection (Fig. 3b). On 
the other hand, the changes associated with non-extreme pIOD events are 
not as apparent (Fig. 3c, d). 

The increased frequency in extreme pIODs does not translate to greater 
intensity of rainfall anomalies overall regions affected by the plOD (Fig. 4a- 
c). Over northeastern equatorial Africa, the extreme plIOD-induced wet 
events do become more intense in the ‘climate change’ period than in the 
‘control’ period (Fig. 4d; the means are statistically different above the 95% 
confidence level). In contrast, there is no statistically significant difference 
between the two periods in the intensity of dry episodes over Java (Fig. 4f). In 
addition, the difference in rainfall intensity of the extreme events is generally 
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smaller than the difference in the mean rainfall (comparing Fig. 4a and 
Extended Data Fig. 8a), despite the far greater anomalies during extreme 
pIOD events. This illustrates that in general the impacts of extreme 
plIOD events experienced in the ‘control’ period will repeat more fre- 
quently in the ‘climate change’ period. 

In summary, our finding of a greenhouse-induced increased frequency 
of extreme pIOD events is in stark contrast with previous results of no 
change in pIOD frequency about the evolving background state. By iden- 
tifying nonlinear processes that give rise to extreme pIOD events, we show 
that under greenhouse warming, the evolving equatorial Indian Ocean 
towards climatologically stronger west-minus-east temperature gradients 
and easterly winds is more susceptible to producing more frequent 
extreme pIOD events. With the projected large increase in extreme 
plIOD events, we should expect more frequent occurrences of devastating 
weather events in affected regions. 


METHODS SUMMARY 


The extreme pIOD events were diagnosed using a suite of distinct process-based 
indicators—such as anomalous equatorial easterlies, low rainfall and atmospheric 
subsidence—as induced by a downstream extension of the southeasterly trades. For 
observations, we focus on historical events in the satellite era (1979-present) monthly 
precipitation analysis, SSTs and other circulation fields from a global reanalysis (see 
Methods). We focus on austral spring, September-November, in which a pIOD typ- 
ically peaks. A heat budget analysis for the eastern-to-central equatorial Indian Ocean 
using the European Centre for Medium-Range Weather Forecasts - Ocean Reanalysis 
System 3 reveals that the strong nonlinear zonal advection of heat sets the observed 
1994 and 1997 extreme pIOD events apart from other events. The nonlinearity sug- 
gests that the traditional DMI, defined as the difference in SST anomalies between 
the western (50° E-70° E and 10° S-10° N) and eastern (90° E-110° E and 10° S- 
0° S) parts of the Indian Ocean’ is not sufficient to differentiate an extreme plOD 
event. Thus, we propose an identification method for extreme pIOD, in which we 
apply EOF analysis to rainfall anomalies and vertical velocity w at 500 mb in the 
equatorial Indian Ocean (40° E-100° E, 10° S-10° N). This produces two principal 
variability patterns, one depicting an east-west pattern and the other depicting dry 
conditions along the central Indian Ocean extending from the east. An extreme 
pIOD event is defined as when the first principal time series is greater than 1 s.d., 
and the second greater than 0.5 s.d. This definition exclusively captures the three 
observed extreme pIOD events. To select CGCMs, the method is applied to 31 
CMIP5 CGCMs, each covering 105 years of a pre-twenty-first-century climate 
change simulation using historical anthropogenic and natural forcings (1901-2005) 
and a further 95 years (2006-2100) under the RCP8.5 forcing scenario”®. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Data, reanalyses and EOF analysis. We used data in the satellite era (1979--present) 
which include Global Precipitation Climatology Project monthly precipitation ana- 
lysis*', global analyses of SSTs”, and circulation fields from the National Center for 
Environmental Prediction and National Center for Atmospheric Research global rea- 
nalysis*’. Ocean column data of velocities and temperatures for heat budget analysis 
are based on the European Centre for Medium-Range Weather Forecasts - Ocean 
Reanalysis System 3 (ECMWE ORA-S3)**. We use a multivariate signal processing 
method referred to as EOF analysis* to anomalies of rainfall and vertical velocity 
at 500 mb (ref. 33). The EOF technique deconvolves the spatio-temporal variability 
into orthogonal modes, each described by a principal spatial pattern and an assoc- 
iated principal component time series. 

Heat budget analysis. We examine the surface heat balance of the tropical Indian 
Ocean which can be expressed as: 


AT" /Ot = —[(w@0T" /ax-+ HOT" dx + u"OT /ax) 
+(v40T* /dy +90T* /dy+v°6T /dy) (1) 
+(w0T* /6z+ WoT? /dz+w* OT /dz)] +Q-+ residual 


The variables T, u, v and w are potential temperature, and the zonal, meridional and 
vertical ocean current velocities, respectively, averaged over the top 50 m. Differential 
operators, x, y, Zand t, are along the zonal, meridional and vertical directions, and time, 
respectively. All variables are derived from the ECMWF ORA-S3 observational data 
assimilation system* at a horizontal resolution of 1° latitude by 1° longitude, increasing 
to0.3° in latitude towards the Equator. The rate of change of the mixed layer temperature 
(dT/dt) is calculated using a centred-difference approximation. Superscript ‘a’ and 
overbar denote anomalous and long-term averaged quantities, respectively. Equation 
(1) states that the rate of change or tendency of the surface temperature is balanced by 
zonal advection of heat by the zonal current (first bracketed terms on the right hand 
side), meridional advection (second bracketed terms), vertical advection (third brack- 
eted terms), the net surface air-sea heat flux (Q), and all other factors not explicitly 
expressed (residual), such as mixing and diffusion. 

We use the entire reanalysis period of the ORA-S3, which spans 1959-2006, to 
examine processes during the 1961 event. All variables in equation (1) are linearly de- 
trended and averaged over the eastern-to-central equatorial region between 5° S-5° N 
and 60° E-100° E, over which the 1961, 1994 and 1997 extreme events emerge as the 
only plODs that exhibit large anomalous cooling (Extended Data Fig. 1a). We examine 
the heat budget terms averaged over the developing period of Indian Ocean dipole 
events (July-October; Extended Data Fig. 1b). It may be noted that the 1997 event 
exhibits exceptionally strong and prolonged cooling compared to the 1961 and 1994 
events, which see an earlier start of the cooling at the end of boreal spring. 

The nonlinear vertical advective process (w“ dT" /0z), that is, the process associated 
with anomalous upwelling and anomalous vertical temperature gradients, contributes 
substantially to the cooling of the equatorial Pacific during these events, especially during 
moderate pIOD events (Extended Data Fig. 1c). During the 1961, 1994. and 1997 events, 
however, the nonlinear zonal advection term (u“0T* /0x) is exceptionally strong, 
extending notably farther to the west from the eastern Indian Ocean, as compared to 
the other events (Extended Data Fig. 2a-d). Although the nonlinear vertical advection is 
more prominent during the 1961 and 1997 events (Extended Data Fig. 1c), which is in 
part also driven by the equatorial easterly winds, it is the nonlinear zonal advection that 
sets the 1961, 1994 and 1997 events apart from the rest of the pIODs. This stems from the 
exceptionally strong westward current and its associated easterly winds (Extended Data 
Fig. 2f). 

As shown in Extended Data Fig. 2f, the significant correlation between the zonal wind 
and current (r = 0.87) means that the nonlinear zonal advection term over the equat- 
orial region can be well approximated as a product between the zonal wind (averaged 
over 5° S—5°N, 60° E-100° E) and the DMI: 


—u* dT" /0x=vDMIa (2) 


with «= B/L, where is the regression coefficient between the zonal wind and zonal 
current, and L is the longitudinal width of the equatorial box (Extended Data Fig. 2a). This 
parameterization is used to represent the nonlinear zonal advection term in the CGCMs. 

Strikingly similar to the nonlinear zonal advection, the proxy exclusively identifies the 
three observed extreme events. It may be noted that the proxy is further from the actual 
value for the 1961 and 1997 events than for the 1994 event. This is expected, owing to the 
particularly strong nonlinear vertical advection ofthe 1961 and 1997 events. Using 93-year 
time series of nine CMIP5 models for which we had access to the required variables, the 
robustness of the proxy is signified by the high positive correlation coefficient with the 
nonlinear advection, ranging from 0.59 to 0.92, significant above the 99% confidence level. 
Characterization of extreme pIOD events. The extreme pIOD events were 
characterized using a suite of distinctive process-based indicators, such as anomalous 
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equatorial easterlies, low rainfall and atmospheric subsidence as induced by a down- 
stream extension of the southeasterly trade winds. For observations, we focus on histor- 
ical events in the satellite era (1979-present) using Global Precipitation Climatology 
Project monthly precipitation analysis*' from http://www.esrl.noaa.gov/psd/data/ 
gridded/data.gpcp.html and SSTs* and other circulation fields from a global rea- 
nalysis**. We focus on austral spring, September-November, in which a plOD 
typically peaks, and apply EOF analysis** to rainfall anomalies and « (ref. 33) at 
500 mb in the equatorial Indian Ocean (40° E-100° E, 10° S-10° N). This produces 
two principal variability patterns. The first principal pattern reflects a strong rain- 
fall reduction over the eastern pole accompanied by a moderately rainfall increase 
over the western equatorial Indian Ocean, and the second principal pattern is char- 
acterized by a westward extension of rainfall reduction from the east, accompanied 
by a rainfall increase farther west near equatorial east Africa. Note that the wetting 
and northwesterly winds off Sumatra in EOF2 are opposite to the drying and south- 
easterly anomalies in EOF1. Along the Equator, dry anomalies north of Sumatra 
embedded in EOF2 oppose weak wet anomalies in EOF1. The opposing polarity 
highlights that during extreme pIOD events, the centre of cold and dry anomalies 
is not concentrated in the Sumatra region but shifts northward for a westward 
extension along the Equator. 

Model selection. We utilize 31 CMIP5 CGCMs forced with historical anthropogenic 
and natural forcings, and future greenhouse gas under emission scenario of 
Representative Concentration Pathway (RCP) 8.5°°, covering a 200-year period. Two 
features of the nonlinearity associated with extreme pIODs are used to select models. 
These are the negative skewness of SST anomalies over the eastern pole, and the non- 
linear positive feedback along the Equator involving the west-minus-east SST gradient, 
wind and oceanic currents, and nonlinear zonal advection, as indicated by a nonlinear 
relationship between the two EOFs. These two features are not mutually inclusive and 
are both used in our study. 

Although the majority of CGCMs generate variability like that of the Indian Ocean 
dipole, only a subgroup of CGCMs simulate the observed nonlinear ocean—atmosphere 
coupling over the eastern Indian Ocean as depicted by the negative skewness of SST 
anomalies over the eastern pole during the austral spring (September-November), 
which is —0.85 in observations since 1979. The level of nonlinearity varies vastly among 
CGCMs, and we consider negative skewness of any extent. Out of the 31 CGCMs, 23 
satisfy the SST skewness criterion. The selected CGCMs yielda mean skewness of — 0.84, 
close to the observed (Extended Data Table 1). 

All selected 23 CGCMs reproduce the observed IOD pattern obtained by regressing 
September-November SST anomalies onto the DMI, with a pattern correlation greater 
than 0.75 (Supplementary Table 3). The same EOF analysis is carried out for each 
individual model using rainfall anomalies referenced to the mean over the ‘control’ 
period. Prior to the analysis, data are interpolated into a common grid of 1.5° latitude by 
1.5° longitude. Our EOF outputs are scaled so that the EOF time series have a standard 
deviation of one to facilitate an inter-model comparison and aggregation. Details of the 
variance explained by EOF1 and EOF2 are listed in Supplementary Table 3. All 23 
models produce the nonlinear relationship between the two leading rainfall EOFs, 
indicating their ability to generate the nonlinear equatorial positive feedback associated 
with the extreme pIOD. Outputs of at500 mbareavailablein 21 of the23 CGCMs, and 
anonlinear relationship between the twoleading vertical velocity EOFsis generated in all 
the 21 models. 

We derive changes in the occurrence of extreme pIOD events by comparing the 
frequency of the first 100 years (‘control’ period) to that of the second 100 years (‘climate 
change’ period). Of the eight CGCMs which are not able to simulate the negative SST 
skewness, only three CGCMs are not able to reproduce the nonlinear relationship 
between the two rainfall EOFs, suggesting that the negative skewness of SST anomalies 
in the eastern pole is not a prerequisite for the equatorial positive feedback associated 
with extreme pIOD events. We also test the sensitivity of our results to varying defini- 
tions (Supplementary Tables 1 and 2), including a case in which the criterion of the 
negative SST skewness is excluded: that is, including all 31 CGCMs. In all cases, there is a 
statistically significant increase (greater than a 130% increase) in the occurrences of 
extreme pIOD events from the ‘control’ to the ‘climate change’ period. 

Occurrences of extreme pIOD and the El Nifio. The modelled increase in extreme 
pIOD events is not induced by a change in the frequency in El Nifio occurrences, 
because there is no inter-model consensus between the two periods in the frequency 
change of El Nifo defined as when the quadratically detrended Nifio3 (5° S-5° N, 
150° W-90° W) SST is greater than 0.5 s.d. (Extended Data Fig. 7a), consistent with 
previous studies***’. Noris there a statistically significant relationship between changes 
in the number of extreme pIOD events and changes in the number of El Nino events 
(Extended Data Fig. 7a), extreme El Nifio defined as with Niiio3 rainfall greater than a 
threshold value (Extended Data Fig. 7b)”*, or detrended Niiio3 SST greater than a 
threshold value (for example, 1.5 s.d) (Extended Data Fig. 7c). In addition, there is 
no systematic change in the relationship between the Indian Ocean dipole and the El 
Nifio/Southern Oscillation (ENSO) (Extended Data Fig. 7d)*". Similarly, there is no 
inter-model consensus on how ModokiE Nitto, defined as occurring when the index” 
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is greater than a 0.5 s.d, will change. Nor is there a statistically significant relationship 
between changes in the number of extreme pIOD events and changes in the number of 
Modoki El Nijio events (Extended Data Fig. 7e), and there is little change in the 
relationship between the Indian Ocean dipole and the Modoki ENSO (Extended 
Data Fig. 7f). 
Statistical significance test. We use a bootstrap method” to examine whether the 
change in frequency of the extreme pIOD events is statistically significant. The 2,300 
samples from the 23 CMIP5 CGCMs in the ‘control’ period are re-sampled randomly 
to construct another 10,000 realizations of 2,300-year records. In the random re- 
sampling process, any extreme pIOD eventis allowed to be selected again. The standard 
deviation of the extreme pIOD frequency in the inter-realization is 11.2 events per 2,300 
years, far smaller than the difference of 234 events per 2,300 years between the ‘control’ 
and the ‘climate change’ periods (Fig. 2c, d), indicating a strong statistical significance. 
The maximum frequency is 176, far smaller than that in the ‘climate change’ period of 
367. Increasing the realizations to 20,000 or 30,000 yields essentially an identical result. 

To further confirm the statistical significance of our result with ample samples of 
IOD behaviour across a longer time series without climate change forcing, we use a 
Canadian model (CanESM2), in which a pre-industrial simulation of 996 years. We 
examine the rarity of extreme pIOD event relative to that in the ‘climate change’ period 
with this same model. In the pre-industrial period the frequency is one per 13 years, but 
in the ‘climate change’ period there is a 180% increase to one event per 5 years. In the 
pre-industrial period, such an extreme pIOD event is far rarer. 

Dividing the 996 years into 9 sets of 100 years and a set of 96 years, we find no 
frequency in these sets is as high as that in the ‘climate change’ period. The lowest 


frequency is one event per 25 years, and the highest frequency, one event per 7.7 years, is 
50% lower than the frequency in the ‘climate change’ period. This highlights the 
robustness of the greenhouse-warming-induced increase in the extreme pIOD fre- 
quency, above that generated by natural variability, which is represented by the spread 
of inter-model differences. 
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Extended Data Figure 1 | Heat budget analysis of the extreme pIOD events 
based on an ocean reanalysis**. a, Temperature anomalies averaged over 

5° S-5° N and 60° E-100° E, over the top 50 m, and over September- 
November. The filled blue and red circles indicate negative and positive DMI, 
with the size of the markers indicating the relative strength of the DMI. b, The 
rate of change of the temperature anomalies as a function of calendar month for 
all positive DMI values, with that of 1961 shown in green, 1994 in light red and 


1997 in dark red, and all others in grey. c, The heat budget components 
averaged over July—October of Indian Ocean dipole development phase, for the 
1961, 1994 and 1997 extreme events, and a composite of moderate pIOD events 
in the satellite era (1982, 1987, 2002 and 2006). The uncertainty bar on each 
composite represents the range of values over the four moderate pIOD events. 
The nonlinear zonal advection term (u“0T* /6x) (dark red in c) is particularly 
large during the 1961, 1994 and 1997 events (see Methods for more details). 
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Extended Data Figure 2 | Nonlinear zonal advection term over the growth _ nonlinear advection term, u“0T“ /0x, averaged over the equatorial boxed 
phase of pIOD events. The nonlinear zonal advection term (u“0T* /0x) region (5° S-5° N and 60° E-100° E; as shown in a) using the product between 
averaged over July to October for: a, a composite of moderate pIOD events, the corresponding zonal wind stress and the DMI (see Methods). The DMI is a 
b, the 1961 pIOD event, c, the 1994 pIOD event and d, the 1997 pIOD event. measure of zonal gradient of temperature anomalies averaged over the western 
The moderate pIOD events taken for the composite in a are the those in the and eastern boxed regions in a. f, The total zonal current versus total zonal wind 
satellite era: the 1982, 1987, 2002 and 2006 events. Stippled locations in stress averaged over the equatorial box region in a. A particularly strong zonal 
a indicate composite values that are significant above the 95% confidence level _ current reversal is seen during the 1961, 1994 and 1997 pIOD events (large red 
(P-value <0.05) according to a Student’s t-test. e, The approximation of the dots in f, see Extended Data Fig. 1a). 
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principal variability patterns of austral spring (September-November) of the corresponding variables onto the principal component time series of 
rainfall. a-c, Vertical velocity @ at 500 mb (Pas ') from reanalysis data** EOF1. d-f, The same as for a-c, but for the second principal variability pattern 


(positive indicating descending motion) (a), SST (°C) (ref. 32) (b) and (Fig. 1d). 
thermocline depth (m) (ref. 34) (c) anomalies associated with the first principal 
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Extended Data Figure 4 | Reconstruction of an extreme pIOD and a 
moderate pIOD event using the first two principal rainfall variability 
patterns. a-d, Composite of anomalies associated with the 1994 and 1997 
extreme pIOD events, showing the observed rainfall and wind stress 
anomalies, and anomalies reconstructed from the first principal, the second 
principal, and the first and second principal components combined, using 
satellite-era rainfall anomaly data from the Global Precipitation 
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Climatology Project version 2 (ref. 31) and reanalysis wind stress**. Note 
the different vector scales shown in the top right corner for each panel. 
e-h, The same as a-d, but for composites of anomalies associated with the 
1982, 1987, 2002 and 2006 moderate pIOD events. The exercise highlights 
that the difference between a moderate and an extreme plOD depends on 
the role of the second principal component, and can only be realized with 
the use of both of the two principal components. 
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Extended Data Figure 5 | Principal variability patterns of vertical velocity at 
500 mb (a), their nonlinear relationship, and the associated wind stress 
vectors during austral spring (September-November), based on a 
reanalysis*’. A positive vertical velocity indicates descending, while a negative 
q@ indicates ascending motion. a, b, Spatial patterns obtained by applying a 
statistical and signal processing method—EOF analysis—to the vertical velocity 
anomalies in the equatorial region (10° S-10° N, 40° E-100° E) for data since 
1979. The associated pattern and wind stress vectors from reanalysis data are 
obtained by linear regression onto the principal component time series of the 
EOFs. The first and second principal spatial pattern accounts for 32.6% and 
16.8% of the total variance. The colour scale indicates vertical velocity in Pas — 
per 1 s.d. change; blue or red contours indicate increased or decreased 


1 


Second principal component 


0 2 ‘ 
NINOS rainfall (mm d') 


convection. Note the different vector scales shown in the top right corner in 
a and b. c, A nonlinear relationship between the associated principal 
component time series. An extreme pIOD event (red dots) is defined as when 
the first principal component is greater than 1 s.d., and the second principal 
component is greater than 0.5s.d. A moderate pIOD event (green dots) is 
determined from a detrended DMI when its amplitude is greater than 0.75 s.d., 
except for the 1994 and 1997 extreme pIOD events. Negative IOD and neutral 
years are indicated with blue dots. d, Relationship between the second principal 
component time series and rainfall over the eastern equatorial Pacific (Nifio3) 
region (5° S-5° N, 150° W-90° W). While the 1997 extreme pIOD was 
associated with a large rainfall in the Nifio3 region, the 1961 and 1994 extreme 
plODs were not. 
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Extended Data Figure 6 | Multi-model ensemble average of the principal 
variability patterns of vertical velocity at 500 mb (@), their nonlinear 
relationship, and the associated wind stress vectors during austral spring 
(September-November). A positive vertical velocity indicates descending, 
while a negative w indicates ascending motion. a, b, Spatial patterns obtained by 
applying a statistical and signal processing method—EOF analysis—to the 
vertical velocity anomalies in the equatorial region (10° S-10° N, 40° E-100° E). 
The associated pattern and wind stress vectors are obtained by linear regression 
onto the principal component time series. The colour scale below gives vertical 
velocity in ms ' per 1s.d. change; blue or red contours indicate increased or 
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decreased convection. Note the different vector scales shown in the top right 
comer in a and b. c and d, A nonlinear relationship between the two principal 
component time series for the ‘control’ (1900-1999) and ‘climate change’ 
(2000-2099) periods. An extreme pIOD event (red dots) is defined as when the 
first principal component is greater than 1 s.d. and the second principal 
component is greater than 0.5s.d. A moderate pIOD event (green dots) is 
determined from a detrended DMI when its amplitude is greater than 0.75 s.d. 
but is not an extreme pIOD event. Negative IOD and neutral years are indicated 
with blue dots. Number of extreme plIOD and moderate pIOD events is 
indicated in c and d. 
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Extended Data Figure 7 | Multi-model statistics between El Nifio and plOD 
in selected CGCMs. a, Changes (‘climate change’ minus ‘control’ period) in 
the number of occurrences of extreme pIOD events versus changes in the 
number of El Nifo events defined as when the amplitude of the detrended 
Nifio3 (5° S-5° N, 150° W-90° W) SST index is greater than 0.5 s.d. b, Changes 
in the number of extreme pIOD events versus changes in the number of El Nino 
events defined as when the Nifio3 total rainfall is greater than 5 mm day _' as in 
ref. 38. c, The same as b, except an extreme El Nifo is determined from a 
detrended Nifio3 (5° S-5° N, 150° W-90° W) SST index when its amplitude is 
greater than 1.5s.d. d, Correlation between a detrended Nifo3 index and a 
detrended DMI index’ for the ‘climate change’ (y axis) and the ‘control’ periods 
(x axis). e, Changes in the number of occurrences of extreme pIOD events 


Correlation (1900-1999) 


versus changes in the number of Modoki El Nino events defined as when the 
amplitude of a detrended index” (see Methods) is greater than 0.5 s.d. 

f, Correlation between a detrended El Nino index and a detrended DMI index 
for the ‘climate change’ (y axis) and the ‘control’ periods (x axis). The inter- 
model correlation and its statistical significance or otherwise are indicated in 
the bottom right corner of each panel, with a P-value less than 0.05, indicating 
significance above the 95% confidence level, a condition not met in a, b, c and 
e. Models with a stronger relationship between ENSO and the Indian Ocean 
dipole in the ‘control’ period tend to have a stronger such relationship in the 
‘climate change’ period, and the tendency is statistically significant, although 
the relationship weakens slightly in the ‘climate change’ period. The same is 
true for the Modoki relationship between ENSO and the Indian Ocean dipole. 
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changes for selected CGCMs. The changes (‘climate change’ minus ‘control’ —_ equatorial Indian Ocean, and the thermocline is shallowing in the eastern 
period) of the ensemble average mean state for: a, rainfall (mmday '),b,SST equatorial Indian Ocean. Areas where changes are statistically significant at the 
(°C), c, wind stress vectors (N m ”) and d, thermocline depth (m). The result 95% confidence level are indicated with stipples, in a, b, and d. In ¢, vectors in 
shows that rainfall off Sumatra is decreasing, the southern eastern Indian Ocean __ bold indicate statistical significance at the 95% confidence level. 
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Extended Data Figure 9 | Schematic of extreme pIOD in response to arrows), with anomalously wet condition in the west and dry in the east. 
greenhouse warming. a, pIOD events are characterized by westward-flowing _ b, During extreme pIOD events, these anomalies are amplified, with 
wind anomalies (blue arrow at the surface) and the associated westward- occurrences of strong reversals of the mean eastward winds and currents (grey 
flowing current anomalies (blue arrow at depth) acting against the prevailing arrows). As the mean Walker circulation and the associated eastward-flowing 
background eastward circulations (black arrows), in association with the ocean current weaken (red arrows), wind and current reversals (orange arrows) 
anomalous positive west-minus-east SST gradient. These result in generally can occur more easily in association with plOD anomalies. Greenhouse 
weaker-than-normal eastward atmosphere and ocean circulations (grey warming thus induces more frequent occurrences of extreme pIOD events. 
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Extended Data Table 1 | Performance of 23 selected CMIP5 CGCMs forced under climate change emission scenario RCP8.5 


CGCM 


IODE skewness 


Extreme plOD events 
defined as (EOF1>1 


Extreme plOD events 
defined as (EOF1>1.5 


1900-1999 s.d. & EOF2>0.5 s.d.) s.d. & EOF2>0.5 s.d.) 
1900-1999)/ (2000-2099 1900-1999)/ (2000-2099 
ACCESS1-0 sst w + -0.2606 5/17 4/10 
bec-csm1-1-M gst w -0.7997 4/18 1/17 
CanESN2 ssr w + -1.4207 1/22 1/22 
CESM1-CAMB gsr w -0.5826 8/22 8/22 
CMCC-CESM gsr w -1.2214 3/13 2/10 
CNRM-CM65 gsr w + -0.2169 1s/i2 5/8 
CSIRO-Mk3-6-0 sst w+ -0.3824 2/26 2/26 
EC-EARTH gsr -0.6927 6/12 6/12 
FGOALS-s2 sst w+ -0.3378 4/29 2/25 
FIO-ESM gsr w -1.6906 4/16 4/14 
GFDL-CM8 gsr w + -0.8189 5/20 5/20 
GFDL-ESM2G ssr w -1.1149 8/5 6/4 
GFDL-ESM2M gsr w+ -0.9843 9/18 8/17 
HadGEM2-AO gsr w -0.2170 5/17 3/16 
HadGEM2-CC csr + -0.2343 10/13 8/12 
IPSL-CM5A-CLR gsr w+ -0.7191 6/10 4/8 
IPSL-CM5B-LR gsr w -0.1511 9/12 3/11 
MIROC5 ssr w+ -1.7136 0/12 0/11 
MPI-ESM-LR gsr w -2.6676 5/12 4/12 
MPI-ESM-MR ser w -1.3712 4/18 4/18 
MRI-FCGCM8 ssr w + -0.6987 5/16 4/14 
NorESM1-M gsr w + -0.5208 74 6/10 
NorESM1-ME esr ws -0.5174 10/13 4/5 
Total and % change between the two periods 133/367(176%) 94/324(245%) 


(23models) 17.3/6.3 Years 24.5/7.1 Years 


These CGCMs are selected in terms of SST skewness in the eastern pole of the Indian Ocean dipole (IODE) and each model’s ability to simulate the nonlinear relationship between rainfall EOF1 and EOF2 (Fig. 2). 
The sensitivity of changes in extreme pIOD events from the ‘control’ period to the ‘climate change’ period to different definitions is tested. An extreme plOD event is defined as when the first principal component 
time series is greater than 1 s.d., or 1.5s.d., and the second principal component time series is greater than 0.5s.d. Numbers in red type indicate a decrease from the ‘control’ period (1900-1999) to ‘climate 

change’ period (2000-2099). Multi-model average SST skewness in the eastern pole of the Indian Ocean dipole is —0.84, compared with the observed value of —0.85. The subscripts SST, w and t indicate that the 
data of SST, vertical velocity at 500 mb and surface wind stress are available, respectively. 
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Elevated CO, further lengthens growing season 
under warming conditions 


Melissa Reyes-Fox'*, Heidi Steltzer**, M. J. Trlica*, Gregory S. McMaster“, Allan A. Andales®, Dan R. LeCain® & Jack A. Morgan® 


Observations of a longer growing season through earlier plant growth 
in temperate to polar regions have been thought to be a response to 
climate warming’ °. However, data from experimental warming stud- 
ies indicate that many species that initiate leaf growth and flowering 
earlier also reach seed maturation and senesce earlier, shortening their 
active and reproductive periods~”®. A conceptual model to explain 
this apparent contradiction”, and an analysis of the effect of elevated 
CO,—which can delay annual life cycle events’? *—on changing sea- 
son length, have not been tested. Here we show that experimental 
warming in a temperate grassland led to a longer growing season 
through earlier leaf emergence by the first species to leaf, often a grass, 
and constant or delayed senescence by other species that were the last 
to senesce, supporting the conceptual model. Elevated CO, further 
extended growing, but not reproductive, season length in the warmed 
grassland by conserving water, which enabled most species to remain 
active longer. Our results suggest that a longer growing season, espe- 
cially in years or biomes where water is a limiting factor, is not due to 
warming alone, but also to higher atmospheric CO, concentrations 
that extend the active period of plant annual life cycles. 

Climate varied considerably between the years studied in this experi- 
ment, most notably through a cool spring in 2009, greater precipitation 
than usual in 2009 and 2011, and low autumn soil water content in 2010 
(Fig. 1 and Extended Data Figs 1, 2). Nonetheless, in all but one year, 
climate warming conditions (cT, where c is a condition of relatively low 
CO, and T is relatively high temperature) changed the timing of species’ 
annual life cycles, increasing the length of the growing season and the 
reproductive season relative to control conditions (ct, where t is relatively 
low temperature) (Fig. 2). Warming led to earlier timing of events by the 
first species to leaf or flower in most years, but species’ sensitivity to 
warming varied among species and among years (Fig. 3 and Extended 
Data Table 1). Often, a cool-season grass, Koeleria macrantha, was the first 
species to leaf and the first species to flower in control (ct) and warmed 
(cT) plots. In most years, warming advanced leaf emergence and flowering 
of K. macrantha (Fig. 3; see Supplementary Information for timing of 
annual life cycle events for all species in all treatments in all years), yet, in 
2009, a year characterized by a cool spring (Fig. 1), warming delayed leaf 


emergence of K. macrantha by 9 days. Warming delays leafing and flower- 
ing for some species”"*, although the mechanism behind this is not clear. 

Contrasting species’ responses to warming between years limits inter- 
pretation of 5-year means; thus we present data yearly. Additionally, yearly 
data illustrate that the first and last species to complete annual life cycle 
events that determine the start and end of the growing or reproductive 
seasons shifted between treatments and between years (Fig. 3). These 
shifts indicate complementarity among species in response to interann- 
ual climate variation and warming, countering the tendency of warming 
to shorten the growing season. For example, in 2009, when warming (cT) 
delayed leaf emergence of K. macrantha, leaf emergence of Artemisia 
frigida, a sub-shrub, was not affected, leading to a shift in which species 
was the first to leaf and no change in timing for the start of the growing 
season relative to the control (ct) (Fig. 3). 

Warming (cT) led to earlier leaf emergence and flowering, but also to 
earlier seed maturation and canopy senescence for some species, espe- 
cially K. macrantha, relative to the control (ct) (Fig. 3). Seed maturation 
by A. frigida, consistently the last species to complete this event, was not 
affected by warming. Therefore, a longer reproductive season in response 
to warming primarily resulted from earlier flowering by K. macrantha. 
The mean active period for K. macrantha shortened (Fig. 4 and Extended 
Data Table 2), but longer growing seasons resulted, because A. frigida 
and Hesperostipa comata did not change or delayed the timing of canopy 
senesce (Fig. 3). Warming extended the duration of the mean reproductive 
period over the 5 years for three of the six species, including K. macrantha 
(Fig. 4 and Extended Data Table 2), primarily through lengthening the 
reproductive period in 2011, the year with the most precipitation (Fig. 1 and 
Supplementary Information). Delayed canopy senescence due to warming 
and a later end of the growing season also only occurred in 2011 (Figs 2 
and 3). 

Variation in plant life history traits within the grassland, such as early 
season growth versus late season tissue maintenance, led to differences 
in species’ responses to warming, supporting a conceptual model of how 
individual species’ responses determine growing season length''. How- 
ever, in contrast to the model’s prediction, divergence in species’ active 
periods was small. Several species, including H. comata, lengthened their 
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Figure 1 | Interannual variation in climate and microclimate (2007-2011). 
Values are mean spring air temperature for day of year (DOY) 60-120 and 
annual precipitation for the study site, and mean growing season (DOY 
60-334) and autumn (DOY 244-304) soil water content for the control plots 


(means + 1 standard error of the mean (s.e.m.), = 5 plots). Spring air 
temperature and autumn soil water content correspond with timing of leaf 
emergence and canopy senescence, respectively. 


1USDA-ARS, Soil Plant Nutrient Research Unit and Northern Plains Area, Fort Collins, Colorado 80526, USA. *Department of Biology, Fort Lewis College, Durango, Colorado 81301, USA. *Department of 
Forest and Rangeland Stewardship, Colorado State University, Fort Collins, Colorado 80523, USA. 4USDA-ARS, Agricultural Systems Research Unit and Northern Plains Area, Fort Collins, Colorado 80526, 
USA. ®Department of Soil and Crop Sciences, Colorado State University, Fort Collins, Colorado 80523, USA. °USDA-ARS, Rangeland Resources Research Unit, Fort Collins, Colorado 80526, USA. 
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Figure 2 | Effect of warming and elevated CO, on growing and reproductive 
season length (2007-2011). Values are the mean number of days difference 
between treatment and control conditions for start, end and duration; see Fig. 3 


active period in response to warming (Fig. 4), maintaining continuity of 
seasonal growth by the plant community. Similarly, for several species 
reproductive periods lengthened under warming (Fig. 4), limiting diver- 
gence within the reproductive season. Thus, our results contrast with other 
studies in which experimental warming led to divergent flowering res- 
ponses between species’ and mid-season, low floral abundance as the 
climate warmed’. Longer active and reproductive periods by at least some 
species would reduce the adverse effects of warming on trophic interac- 
tions and ecosystem function’””®. 


a Leaf emergence 
907 


so, Co H. comata 
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First species leafed 
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RRNA 
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Z 
Z 


for corresponding s.e.m. and significant effects (n = 5 plots). Negative values 
indicate earlier onset of events or shortening of growing or reproductive season 
length. See Methods for explanation of missing data. 


In conditions of elevated CO, (CT, where C is relatively high CO ), 
growing season duration was further lengthened relative to warming alone 
(cT) through the delay of canopy senescence (Figs 2, 3 and Extended Data 
Table 1). In 2009, when spring was cool and annual precipitation was high, 
elevated CO, extended the growing season by delaying senescence of 
A. frigida by 29 days in the warmed ecosystem. Although the magnitude 
of the response was less, the growing season was also significantly 
increased in 2008, when under conditions of elevated CO senescence 
occurred 6 days later in the warmed ecosystem. On average during our 
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Figure 3 | Effect of warming and elevated CO, on timing of annual life cycle 
events (2007-2011). a, b, Values are day of year (DOY) for start and end of 
growing (a) and reproductive seasons (b) (means + 1 s.e.m., n = 5 plots). 
Significant effects (P < 0.05, two-sided) from a four-way ANOVA testing 
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temperature (Temp.; T), elevated CO, (C), species (S), year (Y) and all 
interactions are reported at the top of each row of panels. Species, year and their 
interactions were highly significant for all events (P < 0.001) and are not listed. 
See Extended Data Table 1 for complete ANOVA results. 
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Figure 4 | Effect of warming and elevated CO, on the duration of species’ 
active and reproductive periods. a, b, Values are means across years 
(2007-2011) for species’ active period duration (a) and reproductive period 
duration (b) (means + 1 s.e.m., n = 5 plots). Sub-shrub is indicated by filled 
black circles, cool-season grasses by open circles, warm-season grass by filled 
triangles, forb by filled grey circles. Significant effects (P < 0.05, two-sided) are 
reported as in Fig. 3 at the top of each group of panels. See Extended Data 
Table 2 for complete ANOVA results. 
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5-year study, the growing season ended 7.6 days later due to warming and 
elevated CO, (CT) relative to warming alone (cT), and was 14.2 days 
longer. Owing to warming alone (cT versus ct), the growing season began 
4.7 days earlier and was 6.2 days longer on average. The effect of warm- 
ing alone probably cannot account for the observed change in growing 
season length of ~2 to 5 days per decade during the mid- to late twen- 
tieth century’. 

Our results demonstrate that the effects of warming and elevated CO, 
(CT) on annual life cycle events that determine growing and reproductive 
season lengths depend on climate, varying in the magnitude and even 
direction of response between years (Figs 2, 3 and Extended Data Table 1). 
For example, in 2010, a year with low late summer and autumn precip- 
itation and low autumn soil water content (Fig. 1), elevated CO, (Ct and 
CT) did not affect canopy senescence (Fig. 3). Although elevated CO, (Ct 
and CT) led to greater autumn soil water content in all years (Fig. 5 and 
Extended Data Table 3), in 2010 the water savings may not have been 
sufficient to increase water availability above a threshold (~10%) that 
corresponds to the permanent wilting point (11%) (ref. 19). In the warmed 
ecosystem, elevated CO, (CT) led to the greatest increase in autumn soil 
water content in 2008 and 2009 (Fig. 5), the years in which the greatest 
delays in canopy senescence due to elevated CO, (CT) occurred (Fig. 3). 

Increasing the CO, concentration in the warmed ecosystem (CT versus 
cT) lengthened the active period of all grass species (Fig. 4 and Extended 
Data Table 2). Even the early growing grass K. macrantha, which senesced 
early due to warming alone (cT), showed delayed senescence under ele- 
vated CO, and warming conditions (CT). Warming and elevated CO 
(CT) led to a shorter reproductive season in 3 of 4 years relative to warm- 
ing alone (cT) by decreasing the advance in flowering date and earlier seed 
maturation by A. frigida in 2007 and 2008 (Figs 2 and 3). Elevated CO, 
tended to decrease or have no effect on species’ reproductive period (Fig. 4 
and Extended Data Table 2). Thus, in the warmed ecosystem, elevated 
CO, (CT versus cT) caused species to remain active longer after seed 
maturation, which would not benefit fitness in the year in question but 
may affect it in consequent years. 

Altered flowering times and species’ reproductive periods may have 
long-term consequences for plants and other trophic levels. Other studies 
have also found that elevated CO has a greater effect on flowering times 
under warming conditions, as well as a greater effect on late-flowering 
species'*”°. Our data indicate that higher atmospheric CO, concentrations 
may be contributing to observed changes over time in flowering patterns, 
such as a shorter reproductive season and greater asynchrony with polli- 
nators, which have previously been attributed to warming’*””. 

Furthermore, the dominant hypothesis among global change ecologists 
is that Earth’s longer growing seasons are due to climate warming alone. 
Our results suggest that this hypothesis needs modification to incorporate 
the effects of elevated CO2. We provide evidence in multiple years and of 
a mean effect over the 5-year study that elevated CO, further increases 
growing season length in a warmed, temperate plant community. In many 
ecosystems, sufficient water availability is needed to sustain plant tissues 
from summer into autumn. Dry conditions during warm years have led to 
early senescence and even the death of long-lived plants”. Elevated CO, 
counteracts the negative effect of warming on water availability'°”* (Fig. 5), 
often delaying the timing of plant life cycle events'*”’ (Fig. 3). The effects of 
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Figure 5 | Effect of warming and elevated CO, on autumn soil water content 
(5-25 cm, September-October, 2007-2011). Values are means for soil water 
content corresponding with the timing of canopy senescence (means + 1 s.e.m., 
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ct cT Ct CT 


2009 2011 


n= 5 plots). Significant effects (P < 0.05, two-sided) are reported as in Fig. 3. 
See Extended Data Table 3 for complete ANOVA results. 
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warming and elevated CO, vary across species, events and years. At the 
community level, the different responses lead to a longer growing season 
in most years, because only one species within the plant community needs 
to leaf earlier for spring to begin earlier and a different species can senesce 
later to extend autumn". We demonstrate that warming fairly consistently 
leads to earlier growth in spring and elevated CO, to later senescence 
in autumn, with both mechanisms leading to a longer growing season. 

The stature of grasslands and their ability to encompass thousands of 
individual plants, many species, and different growth forms within a 
relatively small area make them ideal ecosystems in which to conduct 
global change experiments”’*”’. For example, in our experiment, the var- 
ied responses of the three cool-season grasses, K. macrantha, H. comata 
and Pascopyrum smithii, a species that did not affect growing or repro- 
ductive season length in any year, suggest that different species, even 
within a growth form, respond in unique ways to warming and elevated 
CO, (Figs 3 and 4) Asa result, plant community response did not depend 
on a specific species or growth form, and we expect that responses would 
be similar in other temperate to polar plant communities, especially in 
years or biomes where water is a limiting factor. 

Although considerably less than the ~200 p.p.m. of CO, enrichment 
that occurred in our experiment, the ~60 p.p.m. increase in global atmo- 
spheric CO2 concentrations since the 1970s is probably sufficient to elicit 
significant stomatal closure**”, resulting in some water savings and an 
effect on phenology, as in our experiment. Certainly the ~115 p.p.m. 
increase in global atmospheric CO, concentrations since industrialization 
has been more than enough to elicit considerable CO,-induced water 
savings” and affect growing season length, although data on growing 
season length are available primarily from the mid-twentieth century’. 
Ongoing increases in ambient CO, are expected to continue to shift the 
timing of species’ reproductive periods and senescence, and thus the 
duration of the growing and reproductive seasons. 


METHODS SUMMARY 


The experiment, initiated in 2006, is located in temperate grassland in Wyoming, 
United States (41° 11’ N, 104° 54’ W). It includes two levels of temperature (ambient 
and warmed, 1.5 or 3.0 °C warmer during the day and night, respectively) and two 
levels of atmospheric CO, concentrations (ambient and elevated, 385 p.p.m.v. and 
600 p.p.m.v. CO, respectively) in a factorial combination with five replicate plots per 
treatment. T-FACE technology was used for increasing the temperature’*. Free air 
CO, enrichment (FACE) technology was used for elevating CO, (ref. 27). Further 
description of the experiment can be found elsewhere’. 

The timing of the four annual life cycle events that determine the start and end of 
species’ active and reproductive periods (leaf emergence, flower production, seed 
maturation and canopy senescence) was observed weekly for six common species. 
The most abundant species in each growth form were chosen. From 2007 to 2009, 
individual plants were marked upon emergence and monitored for the duration of the 
growing season. The timing of each event was characterized by the mean value for the 
marked individuals of each species. In 2010 and 2011, the timing of each event was 
characterized by the median value, the point at which an event was completed by half 
the typical number of marked individuals for each species within a plot. Further 
description is included in the Methods. 

Weanalysed the data across the years through an analysis of variance (ANOVA) for 
each life cycle event, duration of active and reproductive periods, and autumn soil 
water content, using Proc Mixed to test the main effects of temperature, elevated CO,, 
species (if applicable) and year, and all their interactions (SAS version 9.2). Soil type 
was included as a random effect and was not significant. Mixed model ANOVA allows 
for unequal variances and data were near a normal distribution. Analyses were done on 
untransformed data. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Site description. The Prairie Heating and Carbon Dioxide Enrichment (PHACE) 
experiment, initiated in 2006, is located west of Cheyenne, Wyoming, United States at 
the USDA-ARS High Plains Grasslands Research Station in the US Great Plains (41° 
11’ N, 104° 54’ W, elevation 1,930 m). This is a Northern mixed-grass prairie ecosys- 
tem, with a plant community composed of 55% cool-season grasses, 25% warm-season 
grasses, and 20% sedges, forbs and small shrubs. Total annual precipitation averages 
38.5 cm and mean daily air temperatures range from —2.5 °C in January to 17.5 °C in 
July. The average wind speed is 6 ms’ with gusts up to 35 ms__'. The site comprises 
two distinct soil types: an Ascalon variant loam (fine loamy, mixed mesic) at the north 
end of the field and an Altvan loam (fine loamy over sandy, mixed mesic) on the south 
end. The site has a history of moderate grazing from 1928 until the PHACE project 
began in 2006. 

Experimental design. The experiment includes two levels of temperature (ambient 
and warmed, 1.5 and 3.0°C warmer during the day and night, treatments t and T, 
respectively) and two levels of atmospheric CO, concentrations (ambient 385 p.p.m.v. 
and elevated 600 p.p.m.v. COs, treatments c and C, respectively) in a factorial com- 
bination with five replicate plots per treatment (ct, cT, Ctand CT) for a total of 20 plots. 
Warming and elevated CO, treatments were randomly assigned to the 3.3 m diameter 
circular plots. T-FACE technology for increasing temperature was implemented on 
10 April 2007, after leaf emergence by cool-season grasses and shrubs, and warmed 
plots year round for the duration of the experiment”®. As warming began after the first 
species leafed in 2007, leaf emergence data were omitted for all species in 2007 and 
growing season length was not calculated. Dummy heaters were installed in non- 
heated plots to eliminate response differences that may result from shading or 
other influences caused by the heating apparatuses. Free air CO, enrichment (FACE) 
technology was used for elevating CO, and began in 2006 (ref. 27). The CO, fumiga- 
tion system ran continuously during the growing season. The warming treatment 
effectively accelerated the accumulation of growing degree days in all years (Ex- 
tended Data Fig. 1). When placed in the historical context of the last century, below 
average precipitation fell in 2007, 2008 and 2010 with above average precipitation 
falling in 2009 and 2011. Further description of the experiment, including the instru- 
mentation used for monitoring climate and microclimate, is available elsewhere”. 
Phenology observations. The timing of four life cycle events that determine the start 
and end of species’ active and reproductive periods (leaf emergence, flower production, 
seed maturation and canopy senescence) was observed weekly for six common species. 
The most abundant species in each growth form were chosen, including the one sub- 
shrub (A. frigida, L.), a warm-season grass (Bouteloua gracilis, Lag. ex Griffiths), three 
cool-season grasses (H. comata (Elias) Barkworth; K. macrantha (Ledeb.) Schult; and 
P. smithii (Rydb.) A. Léve), anda widespread forb (Sphaeralcea coccinea (Nutt.) Rydb.). 
Leaf emergence was characterized by the first new, green leaf to appear on a shoot. The 
first open flower (forb and sub-shrub) or inflorescence to emerge from the leaf sheath 
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(grasses) determined the timing of flower production. Flower desiccation (forb and 
sub-shrub) and seed head colour and release (grasses) identified the timing of seed 
maturation. And the timing of canopy senescence was characterized by the first sign of 
full canopy leaf death or loss. 

From 2007 to 2009, individual plants of each species were marked upon emergence 
(typically 12) and monitored for the duration of the growing season. The timing of 
an event was characterized by the mean value (day of year (DOY)) for the marked 
individuals of each species. Some species did not flower in all years. In 2009, incomplete 
data were collected on reproductive life cycle events, so reproductive season length was 
not determined. In 2010 and 2011, the timing of an event was characterized as the 
point at which a minimum number of individuals (typically 6) for each species within a 
plot had completed an event, representing the median value. Both approaches char- 
acterize central tendency for event timing across multiple individuals per plot. We used 
these data to determine changes in growing and reproductive season length, presenting 
data annually, and across years for the duration of species’ active and reproductive 
periods. The start and end of the growing season were characterized by the mean 
across replicate plots for leaf emergence by the first species to leaf and for canopy 
senescence by the last species, respectively. Similarly, the start and end of the repro- 
ductive season were characterized by the date on which the first species flowered and 
seed maturation by the last species, respectively. 

Climate and microclimate. Mean daily and mean annual temperature (MAT) and 
mean daily and total annual precipitation (TAP) were calculated on the basis of 
half hourly data from a meteorological station (HOBO, Onset, Inc.) at the field site 
(Extended Data Fig. 1). Growing degree day calculations were completed using data 
from infrared radiometers located within the experimental plots and a base temper- 
ature of 0°C. In each plot, the volumetric soil water content (SWC) was measured 
hourly at 10, 20, 40, 60 and 80 cm depth (EnviroSMART probe; Sentek Sensor Tech- 
nologies). Daily means were calculated for SWC at the primary rooting depth (5- 
25 cm) by averaging the values for the sensors at 10 and 20 cm depth (Extended Data 
Fig. 2). We present data on mean spring air temperature across DOY 60-120, annual 
precipitation for the study site, and mean growing season (DOY 60-334) and autumn 
(DOY 244-304) SWC for the control plots (Fig. 1). Mean autumn SWC is also pre- 
sented for experimental plots (Fig. 5). 

Data analysis. We analysed the data across years through ANOVA for each life cycle 
event, duration of active and reproductive periods, and autumn SWC, using Proc 
Mixed to test the main effects of temperature, elevated CO;, species (if applicable) and 
year, and all their interactions (SAS version 9.2). For all ANOVAs, soil block based on 
the two soil types at the site was included as a random effect in the analyses and was not 
significant. Mixed model ANOVA allows for unequal variances and data were near a 
normal distribution. Analyses were done on untransformed data. Significant main 
effects and interactions (P< 0.05) are reported on figures with complete ANOVA 
results reported in Extended Data Tables 1-3. 
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Extended Data Table 1 | ANOVA results for the timing of annual life cycle events 2007-2011 


“Years 2007-2011.+~+~Leafemergence Flower production Seed maturation Canopy senescence _ 
Main effects F P F P F P F P 
Temp 143.7* <0.001 42.0 <0.001 12.5 <0.001 0.4 0.503 
COz 0.3 0.594 7.5 0.007 17.7 <0.001 9.5 0.002 
Species 136.1 <0.001 1639.5 <0.001 1557.8 <0.001 74.0 <0.001 
Year 50.3 <0.001 42.2 <0.001 9.7 <0.001 118.8 <0.001 
Temp x COz 1.0 0.325 0.1 0.793 0.0 0.998 2.5 0.117 
Temp x species 5.1 <0.001 5.1 <0.001 3.4 0.006 3.6 0.003 
CO x species 1.0 0.419 3.4 0.005 4.0 0.002 1.9 0.094 
Temp x year 28.0 <0.001 0.8 0.529 6.0 <0.001 1.4 0.224 
COz x year 0.1 0.956 0.7 0.605 6.1 <0.001 5.8 <0.001 
Species x year 10.3 <0.001 27.1 <0.001 94.9 <0.001 = 12.1 <0.001 
Temp x CO x species 0.4 0.859 2.2 0.060 0.3 0.928 1.5 0.191 
Temp x CO x year 0.2 0.905 1.1 0.342 0.8 0.556 0.6 0.685 
Temp x species x year 1.8 0.043 1.5 0.106 2.3 0.002 1.3 0.154 
CO> x species x year 0.7 0.824 1.6 0.054 5.2 <0.001 1.5 0.072 
Temp x CO2x species x year 1.1 0.377 1.7 0.053 1.1 0.381 1.6 0.048 


* Significant effects and interactions are in bold. 
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Extended Data Table 2 | ANOVA results for the duration of species’ 
active and reproductive periods 


Years 2007-2011 Active period Reproductive 
duration period 

duration 
Main effects F P F P 
Temp 20.1* <0.001 10.0 0.002 
CO2 4.2 0.04 0.3 0.59 
Species 103.9 <0.001 84.6 <0.001 
Year 34.0 <0.001 42.1 <0.001 
Temp x CO2 0.1 0.72 0.0 0.98 
Temp x species 6.1 <0.001 1.3 0.28 
CO, x species 0.9 0.48 0.3 0.91 
Temp x year 7.7 <0.001 3.3 0.01 
CO> x year 2.3 0.08 2.6 0.04 
Species x year 10.3 <0.001 36.2 <0.001 
Temp x CO2 x 1.3 0.28 1.5 0.18 
species 


Temp x COz x year 2.0 0.12 1.6 0.17 


Temp x species x 2.7 0.001 1.5 0.09 


year 
COz x species x 1.4 0.14 2.4 0.002 
year 

Temp x CO2 x 1.4 0.14 2.4 0.005 


species x year 


* Significant effects and interactions are in bold. 
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Extended Data Table 3 | ANOVA results for autumn soil water content 


Years 2007-2011 Autumn soil water content 


Main effects F P 
Temp 17.6* <0.001 
COz 57.5 <0.001 
Year 62.2 <0.001 
Temp x COz2 2.9 0.09 
Temp x year 5.0 0.001 
CO> x year 1.5 0.21 
Temp x CO2 x year 0.2 0.94 


* Significant effects and interactions are in bold. 
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Emergence of reproducible spatiotemporal activity 


during motor learning 


Andrew J. Peters', Simon X. Chen! & Takaki Komiyama’? 


The motor cortex is capable of reliably driving complex movements” 
yet exhibits considerable plasticity during motor learning* ’°. These 
observations suggest that the fundamental relationship between 
motor cortex activity and movement may not be fixed but is instead 
shaped by learning; however, to what extent and how motor learning 
shapes this relationship are not fully understood. Here we addressed 
this issue by using in vivo two-photon calcium imaging” to monitor 
the activity of the same population of hundreds of layer 2/3 neurons 
while mice learned a forelimb lever-press task over two weeks. Excit- 
atory and inhibitory neurons were identified by transgenic labelling’*”’. 
Inhibitory neuron activity was relatively stable and balanced local 
excitatory neuron activity on a movement-by-movement basis, whereas 
excitatory neuron activity showed higher dynamism during the ini- 
tial phase of learning. The dynamics of excitatory neurons during the 
initial phase involved the expansion of the movement-related popu- 
lation which explored various activity patterns even during similar 
movements. This was followed by a refinement into a smaller popu- 
lation exhibiting reproducible spatiotemporal sequences of activity. 
This pattern of activity associated with the learned movement was 
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Figure 1 | Lever-press task and chronic calcium imaging of excitatory and 
inhibitory populations. a, Task schematic. b, Lever movement traces in 
rewarded trials from one mouse. Grey, 10 individual trials; black, average of all 
trials; red dotted line, movement onset. c, Top: median pairwise correlation 
coefficients of rewarded movements on individual trials over 3 s, averaged 
across animals. Bottom: pairwise movement correlation on individual trials 
within and across sessions corresponding to the black and grey arrows 
indicated on the top, respectively. Individual movements became more similar 
within (r = 0.35, P< 0.001) and across (r = 0.37, P< 0.001) sessions. See 
Methods for sample size. d, Top and middle: GCaMP5G expression in layer 2/3 
neurons imaged 2 weeks apart. Insets: magnified images of outlined areas. 
Bottom: merge of tdTomato expressed in all inhibitory neurons (red) and 
GCaMP5G (green). e, Top: activity of all simultaneously imaged movement- 
related 38 excitatory (green) and 42 inhibitory (red) neurons from one animal. 
Each row represents a neuron. Middle: AF/Fy traces from one neuron each and 


unique to expert animals and not observed during similar move- 
ments made during the naive phase, and the relationship between 
neuronal activity and individual movements became more consistent 
with learning. These changes in population activity coincided with 
a transient increase in dendritic spine turnover in these neurons. 
Our results indicate that a novel and reproducible activity-movement 
relationship develops as a result of motor learning, and we speculate 
that synaptic plasticity within the motor cortex underlies the emer- 
gence of reproducible spatiotemporal activity patterns for learned 
movements. These results underscore the profound influence of learn- 
ing on the way that the cortex produces movements. 

We developed a cued lever-press task performed by mice under a two- 
photon microscope (Fig. 1a), similar to other recently reported tasks'*"». 
Briefly, a lever press beyond the set threshold during an auditory cue 
was rewarded with water (Methods). Mice were trained with this task 
daily for 2 weeks (n = 10). Even though mice achieved a reward in most 
trials, the timing of their behaviour improved in later sessions (Extended 
Data Fig. 1). Furthermore, lever movements on individual trials became 
more stereotyped over time (Fig. 1b). The reproducibility of movement 
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average AF/F, of all imaged neurons of each type (152 excitatory and 77 
inhibitory). Bottom: task-related events; yellow shading indicates movement 
epochs. f, Fractions of active inhibitory neurons and excitatory neurons during 
rewarded movements are correlated on a movement-by-movement basis 

(r = 0.63-0.67, P< 0.001). This relationship is stable throughout learning 

(P = 0.92, one-way ANOVA comparison of median excitatory/inhibitory 
ratios). g, Pairwise correlation coefficients between inhibitory and excitatory 
neuron activity decrease with distance (P < 0.001, comparison between pairs 
within 150 jim and all other pairs, Wilcoxon rank sum test, n = 653,046 
excitatory-inhibitory pairs total). h, Individual movement-related inhibitory 
neurons are classified on more sessions, showing that excitatory neurons are on 
average more dynamic than inhibitory neurons (P < 0.001, Wilcoxon rank 
sum test, n = 473 and 231 movement-related excitatory and inhibitory 
neurons, respectively). All error bars are s.e.m. 
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kinematics was evident in higher correlation of rewarded movements 
on individual trials within and across later sessions (Fig. 1c). The motor 
cortex is necessary for this task, as lesions before training prevented the 
emergence of movement stereotypy, and acute inactivation by pharma- 
cology or optogenetics impaired task performance (Extended Data Fig. 2). 
To identify how the activity of motor cortex neuronal ensembles is 
modified during this learning, we combined the lever-press task with 
chronic two-photon calcium imaging*”®. In this study we focused on 
neurons in layer 2/3, the major input layer capable of driving deeper layer 
neurons to produce motor cortex outputs’””*. Before training, we injected 
an adeno-associated virus encoding the Ca** indicator GCaMP5G”” 
into the right forelimb area of the motor cortex to express GCaMP5G 
in all neuron types. Optogenetic stimulation of this area evoked forelimb 
movements (Extended Data Fig. 3). GCaMP5G fluorescence reported 
spiking activity with high temporal precision (jitter of spikes and calcium 
events = 7.1 + 41.4 ms, median + standard deviation (s.d.); Extended 
Data Fig. 4). We used transgenic mice that express tdTomato in all 
GABAergic inhibitory neurons (Gad2-IRES-Cre (ref. 12); Rosa-LSL- 
tdTomato (ref. 13)) to identify excitatory (only expressing GCaMP5G) 
and inhibitory (expressing both tdTomato and GCaMP5G) neurons. 
Two weeks after surgery, hundreds of neurons were imaged through a 
chronic window. A total of 202 + 18 (mean + standard error of the mean 
(s.e.m.)) neurons were imaged in each animal, with 20.9 + 2.6% being 
inhibitory, consistent with the composition of the cortex”®, 
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Figure 2 | Dynamics of spatiotemporal activity of excitatory neurons during 
learning. a, Dynamics of three excitatory neurons. Top rows: images of 
neurons confirming reliable identification. Bottom rows: black, mean AF/Fo; 
grey, s.e.m.; arrow, movement onset. b, Mean fraction of excitatory neurons 
classified as movement-related in each session (increase over sessions 1-3, 
P<0.01; decrease over sessions 4-14, P< 0.01, Methods). c, Dynamic 
population of excitatory neurons from one mouse. Green, movement-related 
excitatory neurons; grey, non-classified excitatory neurons. d, Correlation of 
population activity of all excitatory neurons during rewarded movements 
across sessions (Methods). The population of movement-related excitatory 
neurons became more stable in later sessions (P < 0.001, session 1-4 pairs 
versus session 10-14 pairs, Wilcoxon rank sum test). e, Average fraction of 
excitatory neurons that are active in each individual movement out of all 
excitatory neurons remains stable throughout learning (r = 0.00, P = 0.98). 

f, Cumulative distribution of fraction of trials in which each movement-related 
excitatory neuron is active. In sessions 3-4, individual movement-related 
neurons are active in fewer trials compared to sessions 1-2 or 11-14 (P< 0.001, 
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Weimaged the activity of the same population of excitatory and inhib- 
itory neurons over the course of 2 weeks while mice simultaneously 
learned and performed the lever-press task (Fig. 1d, n = 7 mice). High 
correlations of neural activity and lever-press movements were evident 
both at the level of single neurons and population average in both excit- 
atory and inhibitory neurons (Fig. le). A large fraction of imaged neu- 
rons showed significantly more activity during lever-press movements 
and were thus considered movement-related (51.4 + 5.7% of imaged 
neurons, 44.0 + 6.2% of excitatory and 78.7 + 5.7% of inhibitory, were 
classified as movement-related in at least one session; Methods). Movement- 
related neurons did not show obvious spatial clustering (Extended Data 
Fig. 5). 

We investigated the relationship between excitatory and inhibitory 
populations. We found a positive correlation between the fraction of active 
inhibitory and excitatory neurons on a movement-by-movement basis; 
during movements that activated a larger fraction of excitatory neurons, 
a larger fraction of inhibitory neurons were also active. This relation- 
ship of excitatory and inhibitory activity remained constant for the entire 
2 weeks of imaging (Fig. 1f). Individual inhibitory neurons were particu- 
larly correlated with nearby excitatory neurons within 150 jum, consis- 
tent with their connectivity’’” (Fig. 1g). This local matching of excitatory 
and inhibitory activity probably provides a basis for the balance between 
excitatory and inhibitory inputs to individual neurons observed in the 
cortex (reviewed in ref. 20). Even though the ratio of excitatory and inhibitory 
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Kolmogorov-Smirnoyv test). g, Standard deviation of the timing of activity 
onsets for movement-related excitatory neurons decreases over sessions 

(r = —0.24, P< 0.001). Neurons that were active in less than five trials of a 
given session were excluded from this analysis. h, Pairwise trial-to-trial 
correlation of temporal population activity vectors increases with learning 

(r = 0.43, P< 0.001). Temporal population activity vector was a concatenation 
of the activity traces of all movement-related neurons and thus maintained 
temporal information within each movement. i, Activity onsets of excitatory 
neurons from one animal that are movement-related and active in at least 10% 
of trials on the sessions indicated. Arrow, movement onsets; colours, individual 
neurons sorted according to their preferred timing. Note that same colours 
across sessions are not necessarily the same neurons. j, Maximum-normalized 
average activity from all movement-related neurons from all animals in session 
2 (left, 106 neurons) and session 14 (right, 84 neurons) aligned to movement 
onset (arrow). Activity timing is refined over time, shown by narrower peaks 
and lower background in session 14, and shifts towards movement onset. See 
Methods for sample size. All error bars are s.e.m. 
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activity on a trial-to-trial basis was stable, the identity of movement- 
related neurons was dynamic. In particular, excitatory neurons on average 
had a higher degree of turnover than the inhibitory population, indi- 
cating that the excitatory population is more dynamic during learning 
(Fig. 1h and Extended Data Fig. 6a). We therefore focused on excitatory 
neurons for the following analyses. 

Many excitatory neurons were transiently movement-related (Fig. 2a). 
In the initial phase of learning, a large fraction of excitatory neurons 
developed movement-related activity, resulting in a marked increase 
of the movement-related population (Fig. 2b, c). After this initial expan- 
sion, the fraction of movement-related neurons decreased gradually 
through the remaining course of the experiment (Fig. 2b, cand Extended 
Data Fig. 6b). This resulted in a smaller and more stable population of 
movement-related neurons towards the end of learning (Fig. 2d). The 
expansion and refinement was not seen during spontaneous movements 
without training (Extended Data Fig. 6c). Despite these changes in the 
ensemble of movement-related neurons during learning, the average frac- 
tion of excitatory neurons active on each trial remained stable (Fig. 2e). 
This constant level of activity was maintained despite a changing size 
of movement-related populations because of the corresponding shift 
in the frequency of activity in individual neurons (Fig. 2f). Various com- 
binations of excitatory neurons were therefore used within the motor 
cortex during the initial phase of learning, followed by a refinement of 
the population to form a stable network associated with the learned 
movement. 

We next examined the timing of activity of individual neurons. Asa 
population, the activity of movement-related excitatory neurons diverged 
from baseline at 105 ms before the movement onset and continued 
throughout the duration of movements (Extended Data Fig. 6d). During 
the first few sessions of learning, movement-related excitatory neurons 
showed variable timing of activity on individual trials relative to the move- 
ment onset. Conversely, movement-related neurons in later sessions 
showed reproducible activity timing relative to movement onset (Fig. 2g 
and Extended Data Fig. 6e). As a result, the temporal activity pattern 
became progressively more stable during learning (Fig. 2h). This activity 
sequence tiled the entire duration of movement (Fig. 2i, j and Extended 
Data Fig. 6f). Furthermore, the population activity shifted towards the 
beginning of movements over the course of the experiment (Fig. 2) and 
Extended Data Fig. 6g). 

The observed stabilization of motor cortex activity may result from 
the selection of a particular activity-movement pair out of many that are 
explored during initial learning. In this case, activity during the learned 
movement and activity during similar movements made early in learn- 
ing should resemble each other. Alternatively, the ‘learned’ activity pattern 
may be unique to the expert stage after learning. To distinguish between 
these possibilities, we evaluated the relationship between movement and 
neural activity. We defined the ‘learned’ patterns of activity and move- 
mentas the averages of a randomly chosen 50% of trials from the expert 
sessions (sessions 10-14, Methods). When the remaining 50% of trials 
from the expert sessions were sorted according to the similarity of the 
movement in each trial with the learned movement pattern, a clear rela- 
tionship between movement and activity became evident—the similarity 
of activity to the learned activity pattern increased with the similarity 
of movements to the learned movement (Fig. 3a, b). Remarkably, this 
relationship was much weaker when trials from naive sessions (sessions 
1-3) were considered. Regardless of the similarity of movement to the 
learned movement pattern, the similarity of activity to the learned activity 
pattern was consistently low in naive sessions (Fig. 3a, b). In other words, 
the learned activity pattern was reproducibly observed only when the 
expert animals made the learned movement, whereas similar movements 
made in naive sessions were accompanied by very different activity pat- 
terns. Furthermore, the general relationship between activity and move- 
ment in pairs of trials became more consistent after learning, whereas 
activity in naive animals was more variable regardless of movement sim- 
ilarity (Fig. 3c). These analyses of learning-related changes in population 
activity were performed using the entire movement periods. However, 
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Figure 3 | Learning-related emergence of reproducible spatiotemporal 
activity. a, Heat maps show mean activity of movement-related excitatory 
neurons classified during expert sessions of all animals aligned to movement 
onset (left edge). Traces show mean lever movements. Trials are binned 
according to the correlations of the movements on those trials with the learned 
movement pattern (Methods). Left, learned patterns; top, naive sessions 
(sessions 1-3); bottom, expert sessions (sessions 10-14). b, Correlation of trial 
activity with the learned activity pattern increases with the correlation of trial 
movement with the learned movement pattern in expert sessions. Movements 
similar to the learned movement pattern but made in naive sessions display 
activity very different from the learned activity pattern (P = 0.83, 0.35 and 
<0.001 in the bins 1, 2 and 3-9, respectively, Wilcoxon rank sum test). 

c, Pairwise trial-to-trial correlation of temporal population activity vectors 
(defined as in Fig. 2h) plotted as a function of movement correlation on those 
trials. A stronger relationship between population activity and movement 
emerges during learning (P = 0.38, 0.18, 0.04, 0.002, <0.001 and 0.02 in bins 1, 
2, 3, 4, 5-8 and 9, respectively, Wilcoxon rank sum test). See Methods for 
sample size. All error bars are s.e.m. 


the results were similar when only the periods from movement onset 
to reward were considered (Extended Data Fig. 7). 

The plasticity of population activity described above could simply reflect 
changes in other brain areas providing inputs to these neurons. How- 
ever, synaptic plasticity within the motor cortex could also contribute 
to the changes in population activity. To test whether learning of this 
lever-press task induces synaptic plasticity in layer 2/3 of the motor 
cortex, we labelled sparse subsets of layer 2/3 neurons and chronically 
imaged spines on the same dendritic branches of excitatory neurons 
over the course of learning (n = 191 spines in 3 mice). Imaging was per- 
formed immediately before each training session in awake animals. We 
observed the formation of a number of dendritic spines during the ini- 
tial sessions of training, followed by the elimination of some of the spines 
that were present at the beginning of the experiment. Most (95%, 19 out 
of 20) of the spines that formed during training persisted for the entire 
2 weeks. At the population level, these changes resulted in a transient 
10% increase in the density of spines followed by return to the baseline 
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Figure 4 | Learning-related plasticity of dendritic spines. a, Example layer 
2/3 excitatory neuron dendrites imaged in awake mice throughout learning. 
Arrowheads, added spine; arrows, eliminated spine; scale bars, 2 um. 

b, Summary of spine dynamics in trained and control animals. Top: spine 
additions (black) and eliminations (grey) in each session. For control animals, 
data from all sessions are combined. Bottom: total spine number across sessions 
normalized to session 1 in each condition. Spine density transiently increases 
during learning (P < 0.001, control versus training sessions 4-7, Wilcoxon 
rank sum test). 1 = 191 spines in 3 mice. All error bars are s.e.m. 


(Fig. 4). These results are analogous to previous reports in motor cortex 
layer 5 neurons during different motor learning tasks**. Spines were 
largely stable in a separate group of animals that did not undergo training 
but otherwise were treated identically including water restriction and 
head fixation (Fig. 4b). The spine density was also stable in the hind- 
limb area in the motor cortex during learning (Extended Data Fig. 8). 
These results indicate that our lever-press task induces area-specific reor- 
ganizations of excitatory synapses onto layer 2/3 neurons during learning. 

Previous studies suggested a relatively stable tuning and population 
code of motor cortex neurons in well-trained animals”***. However, our 
understanding is quite limited as to the changes in the ensemble activity 
pattern during the transition from naive to expert stages. Our results 
indicate that the relationship between movements and activity is ini- 
tially inconsistent (that is, degenerate), and the early days of learning 
involve the expansion and exploration of various movement-related 
activity in the motor cortex. An increased variability of single-neuron 
activity in the motor cortex has been observed during learning of visu- 
omotor adaptation” and a brain-machine interface task”®. Such trial-to- 
trial variability has been proposed to provide the basis for exploration 
of possible network states and facilitate learning’®. Our results directly 
demonstrate such an exploration during initial learning at the popu- 
lation level. After this period of high variability, the activity-movement 
degeneracy is reduced and a reproducible temporal sequence of activity 
emerges in a stable population of excitatory neurons (Extended Data Fig. 9). 
Such spatiotemporal activity may orchestrate the temporal dynamics 
of the learned movement. Reproducible temporal patterns of popula- 
tion activity during learned movements are proposed to be generated 
by internal connections within the motor cortex’”*. We note that our 
results do not provide a causal link between local synaptic plasticity and 
changes in population activity. Nevertheless, we show that these pro- 
cesses occur during motor learning on similar timescales, which sup- 
ports the notion that local synaptic plasticity may generate a circuit to 
reproduce a particular spatiotemporal activity pattern. These new cir- 
cuits may be more efficient in driving movements, which could underlie 
the lower metabolic activity in the motor cortex observed during exe- 
cution of well-practiced movements”””**. Our study provides a glimpse 
of the emergence of population activity patterns for learned movements. 


METHODS SUMMARY 


Surgeries were performed to inject viruses in the right forelimb area of the motor 
cortex and implant a chronic window and head plate. For functional imaging, AAV2/ 
1-Syn-GCaMP5G was injected in Gad2-IRES-Cre; Rosa-LSL-tdTomato mice; for 
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structural imaging, a mixture of AAV2/1-CAG-FLEX-EGFP and AAV2/1-CMV- 
PI-Cre was injected in C57BL/6 mice. Imaging and behaviour experiments started 
at least 2 weeks after surgery and mice were at least 8 weeks of age. Imaging was 
performed in awake mice during (functional imaging) or right before (structural 
imaging) each behavioural session. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Animals. All procedures were in accordance with protocols approved by the UCSD 
Institutional Animal Care and Use Committee and guidelines of the National Insti- 
tute of Health. Mice (calcium imaging: cross between Gad2-IRES-Cre (ref. 12) and 
Rosa26-CAG-LSL-tdTomato (ref. 13), Jackson laboratories; structural imaging: 
C57BL/6, Charles River Laboratory) were group housed in disposable plastic cages 
with standard bedding in a room with a reversed light cycle (12 h-12h). After sur- 
gery, animals were singly housed. Experiments were performed during the dark 
period. No randomization was used, but internal controls were included whenever 
possible. 

Surgery. Adult mice (6 weeks or older, male and female) were anaesthetized with 
isoflurane and injected with dexamethasone (2 mg kg” ') and baytril (10 mg kg ') 
intramuscularly to prevent brain swelling and infection. A custom head-plate was 
glued to the skull and craniotomy (~2 mm diameter) was performed as described 
over the right caudal forelimb area. Viruses (UPenn Vector Core; calcium imaging: 
AAV2/1-syn-GCaMP5G diluted 1:1-3 in saline, 5-6 sites; structural imaging: AAV2/ 
1-CAG-FLEX-EGFP (1:1) and AAV2/1-CMV-PI-Cre (1:5,000) diluted in saline, 
3 sites) were injected in the caudal forelimb area of the motor cortex around the 
coordinate of 300 tm anterior and 1,500 um lateral from bregma, according to 
previous microstimulation experiments'***-°**. We confirmed that optogenetic sti- 
mulation of this area evoked forelimb movements (Extended Data Fig. 3). This area 
also falls near the border of the abduction and adduction areas defined by ref. 2. For 
control experiments targeting the hindlimb area of the motor cortex, viruses were 
injected around the coordinate of 1,500 tum posterior and 1,500 tm lateral from 
bregma, according to previous microstimulation experiments*”****”*. Each injec- 
tion consisted of ~20 nl at a depth of ~250 tum from the pia injected over 2-4 min 
and each injection site was separated by ~500 jim. Pipettes were left in the brain for 
3 min after injection to avoid backflow. After virus injections, a chronic imaging 
window was implanted consisting of a glass plug glued onto a larger glass base. The 
edges between the glass plug and the skull were filled with 1.5% agarose and the 
window was secured using dental acrylic. Buprenorphine (0.1 mg kg” ') was injected 
subcutaneously at the end of surgery. 

Behaviour. Three days after surgery, mice were water-restricted at 1 ml per day. After 
~14 days of water restriction, mice were trained daily for 14 days while two-photon 
imaging was applied simultaneously. The lever was built using a piezoelectric flex- 
ible force transducer (LCL-113G, Omega Engineering) attached toa 1/16-mm-thick 
brass rod. The voltage from the force transducer, which is proportional to the lever 
position, was continuously recorded using a data acquisition device (LabJack) and 
software (Ephus, MATLAB, Mathworks) working with custom software running 
on LabVIEW (National Instruments) which monitored threshold crossing. The 
behavioural setup was controlled by software (Dispatcher, Z. Mainen and C. Brody) 
running on MATLAB communicating with a real-time system (RTLinux). A 6-kHz 
tone marked a period during which lever-press was rewarded with water (~8 Ll per 
trial) paired with a 500-ms, 12-kHz tone, followed by an intertrial interval (variable 
duration, see below). Lever-press was defined as crossing of two thresholds (~1.5 
mm and ~3 mm below the resting position) within 200 ms. The 3-mm threshold 
defined the displacement required, and the 1.5-mm threshold ensured that the mouse 
did not hold the lever near the lower threshold. Failure to press during the cue period 
triggered a loud white noise and an intertrial interval. Lever presses during inter- 
trial intervals were neither rewarded nor punished. The cue period was decreased 
during the first two sessions from 30 s to 10 s. The reward period was reduced dur- 
ing the first three sessions from 2 to 0.4s. The intertrial interval was increased 
over the first three sessions from 2-4 s to 8-12 s. Each session lasted 20-30 min and 
100-200 trials until terminated when mice stopped performing or consumed 1 ml 
of water. Experiments lasted for 11-14 sessions. One mouse failed to learn the task 
and was thus excluded from all analyses. 

Lesion. Motor cortex lesions were performed under isoflurane anaesthesia. After 
craniotomy was performed as above, cortical tissue was aspirated using a glass Pasteur 
pipette connected to vacuum. Care was taken to avoid damaging the underlying 
white matter. After the lesion, the cavity was filled with Surgifoam (Johnson & 
Johnson), KWIK-CAST (World Precision Instruments) and then with a layer of dental 
acrylic. Mice were allowed to recover for 3 days after the surgery and then placed 
on water restriction. Behavioural training started 2 weeks after lesion. The extent 
of lesion was determined for each mouse with post hoc histology. 

Muscimol inactivation. Mice with imaging windows were first trained with the 
task for 7-14 days. On the day of inactivation, the imaging window was removed 
and ~70 nl of muscimol (5 j1g/1 pl in cortex buffer) was injected over 2-3 min in 
the centre of the craniotomy at the depth of 300 um, under isoflurane anaesthesia. 
The craniotomy was then sealed and mice were allowed to recover in their home 
cage on a heating pad for 60 min before behavioural experiments. For control injec- 
tions, muscimol was injected into the barrel cortex, using the coordinate of 1.4mm 
posterior and 3.1 mm lateral from bregma. 


Optogenetic inactivation. Surgery was performed on PV-Cre mice as above to 
inject AAV2/1-CAG-Flex-ChR2-tdTomato around the forelimb area of the motor 
cortex at five sites. 80 nl was injected at each site at each depth of 400 jum and 800 jum. 
Behavioural training started 3 weeks after surgery. After seven sessions of daily 
training, the cortical area was inactivated in 20% of trials by activating PV neurons” 
by blue light from an LED (~40 mW, 470 nm, Doric Lenses) delivered directly onto 
the centre of the craniotomy covered with a chronic glass window. Blue light was 
delivered starting from 1-2 s before the cue period until the end of the cue period 
(that is, reward delivery or time out). Blue light delivery was performed in seven 
successive sessions. In the last two sessions, the glass window was covered with 
curable silicone (KWIK-CAST, World Precision Instruments) and these served as 
control sessions. 

Optogenetic microstimulation. Surgery was performed on C57BL/6 wild-type 
mice as above to inject AAV2/1-CAG-ChR2-Venus around the forelimb area of 
the motor cortex at 9 sites. 20 nl was injected at each site at the depth of 300 um. 
Three weeks after surgery, mice were head-fixed. Blue light from an LED (~40 mW, 
470 nm, Doric Lenses) was delivered directly onto the centre of the craniotomy 
covered with a chronic glass window for 1 s per trial with the intertrial interval of 
8-12 s. Stimulation was performed without anaesthesia. The effect of the light stimulus 
on forelimb movements was quantified by manually identifying movie frames in 
which the forelimb movements were observed on videos cropped so that the optical 
stimulation was not visible. Pre-light periods were defined as movie frames lasting 
from 2 to 1 s before light onset. Scoring was done by two individuals independently 
and trials which differed between the scorers were excluded (1/79 light trials, 3/79 
pre-light trials). 

Imaging. Imaging was conducted with a commercial two-photon microscope (B- 
scope, Thorlabs) running Scanimage using a 16X objective (NIKON) with excita- 
tion at 925 nm (Ti-Sa laser, Newport). Imaging was always conducted in awake 
animals. For calcium imaging, images (512 512 pixels covering 472 * 508 jm) 
were recorded at approximately 28 Hz in continuous segments about 2 min long 
each with inter-segment intervals of 12 s. The trials that overlapped with the intervals 
were discarded. Signals for the start of each trial were also recorded, which were used 
to align images and behaviour data. Slow drifts in imaging field were manually cor- 
rected using reference images. For structural imaging, stacks of image planes (512 
512 pixels covering 94 X 104 Jum) were acquired at approximately 28 Hz, 20 frames 
per plane, 80-120 planes per animal with a z-axis step size of 1.0 jm between planes. 
Movement analysis. Movement bouts were identified in the lever displacement 
traces (voltage recordings from the force transducer) that were down-sampled from 
10 kHz to 1 kHz and then filtered (4-pole 10 Hz low-pass Butterworth). The velo- 
city of the lever was then determined by smoothing the difference of consecutive 
points with a moving average window of 5 ms. The envelope of the velocity was 
then extracted using a Hilbert transform, and movement bouts were defined by 
the envelope crossing a threshold (4.9 mm per second). Each movement bout was 
extended by 75 ms, bouts separated by less than 500 ms were considered continu- 
ous, and then the start and end times were fine-tuned as follows. The start time was 
defined by finding when the lever position crossed a threshold exceeding the rest- 
ing period before the movement, and the end time was defined by finding when the 
lever position went below a threshold defined by the resting period following the 
movement. Thresholds were the resting position plus the 99th percentile of noise 
distribution defined as the difference between the Butterworth smoothed trace and 
the original trace. These processes were chosen empirically based on visual inspec- 
tion. For trial-based analyses, the trials in which animals were moving the lever at 
the onset of cue were excluded. 

Image analysis. For calcium imaging data, lateral motion was corrected using full- 
frame cross-correlation image alignment (Turboreg” plugin for ImageJ). Motion 
within each frame was negligible due to the fast frame rate. Regions of interest (ROIs) 
were manually drawn using a custom MATLAB program by visually inspecting 
movies from all sessions and selecting neurons that showed at least one fluores- 
cence transient in at least one session. Therefore, our analysis excludes neurons 
that do not show any fluorescence transients during our imaging periods. ROIs were 
aligned across sessions using a semi-automated method and classified as excitatory 
or inhibitory based on tdTomato expression. ROIs which presented with a nucleus 
filled by GCaMP5G at any point during the experiment, indicating possibly abnor- 
mal physiology”, were excluded from all analyses. Other than these nucleus-filled 
neurons, GCaMP- expressing neurons have been shown to display normal physio- 
logical properties including input resistance, resting membrane potential, input- 
output relationship, synaptic input maps, and normal synaptic plasticity*”’. 

For structural imaging data, lateral motion for each image plane (20 frames) was 
corrected using full-frame cross-correlation image alignment (Turboreg™ plugin 
for ImageJ), with the average of the five most consistent consecutive frames as the 
reference image. After this alignment, all 20 frames within a plane were averaged. 
Different image planes were then aligned using recursive alignment of stacks of 
images (Stackreg, plugin for ImageJ). 
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Fluorescence analysis. Pixels within each ROI were averaged to create a fluores- 
cence time series. Background fluorescence fluctuations were subtracted from each 
ROI to remove neuropil contamination as follows. A ring-shaped ‘background 
ROT was created from the border of each neuronal ROI toa width of 6 pixels. From 
this background ROI, pixels containing transients that did not contaminate the 
neuronal ROI were excluded. These excluded pixels were identified as those that 
contained time points at which pixel values exceeded the neuronal ROI by two times 
the standard deviation of the difference between each background ROI pixel time 
series and the neuronal ROI time series. The remaining pixels were averaged to 
create a background fluorescence trace, and the AF of the background fluorescence 
trace was subtracted from the neuronal ROI fluorescence trace to create the final 
background-subtracted fluorescence trace for each neuronal ROI. 

The time-varying baseline (Fo) of a fluorescence trace was estimated by smooth- 
ing inactive portions of the trace using the iterative procedure detailed below. 

Inactive portions of the trace were initially estimated as when the raw trace loess- 
smoothed with a 1-s window was below a threshold. These inactive portions were 
further shortened by 5 s on each end to exclude tails of active portions. The thresh- 
old for activity was estimated by first creating a preliminary Fy approximation, which 
was a 1-min moving average of the original fluorescence trace. This preliminary Fo 
was subtracted from the raw trace to yield a preliminary AF. The noise of the fluo- 
rescence was calculated as the standard deviation of the difference between the 
preliminary AF and the smoothed preliminary AF (loess, 1 s). An offset of the pre- 
liminary Fo was then estimated as the mode of the smoothed preliminary AF. The 
threshold for detecting active portions was then set as preliminary Fo + the offset + 
two times the fluorescence noise. The remaining inactive portions of the trace 
were then concatenated and subjected to a second round of activity extraction using 
the same procedure, but by defining inactive portions where values fell within + 
two times the fluorescence noise. 

Inactive portions were concatenated and smoothed (loess, 1s). The resulting 

smoothed trace was then broken up according to their original time points and 
values were linearly interpolated across gaps (that is, active portions), resulting in 
an Fo estimation that was independent of activity and slow drifts. This Fo estima- 
tion was fine-tuned for an offset as follows. The Fy was subtracted from the raw 
trace, yielding a new AF. The fluorescence noise was once again estimated by the 
standard deviation of the difference between AF and smoothed AF (loess, 1 s). The 
offset was then estimated by the mode of Gaussian fit of the distribution of values 
of inactive portions of the AF. Inactive portions were defined as when the AF values 
were within + two times the noise values, and were shortened by 5s on either end. 
This offset was added to the Fy estimation, yielding the final Fo. 
Activity analysis. Activity event traces were created from normalized background- 
subtracted fluorescence traces. For excitatory neurons, events were defined if the 
first derivative (velocity) of the smoothed fluorescence trace (loess, 1 s) crossed five 
times the standard deviation of the inactive velocity trace (inactive velocity trace 
was derived from periods when the fluorescence was within three times the stan- 
dard deviation of the fluorescence trace). This velocity criterion detected sharp rises 
in the fluorescence trace. For these detected events, the start and end times were 
defined using the following iterative process. The peak time of the event was first 
roughly estimated as the time when the velocity drops below zero for the first time 
after it crossed the threshold as above. The peak time (that is, the end time of the 
event) was then defined as the time of the highest AF/Fp value within five frames 
before and after the initial estimate. We next defined the ‘baseline’ AF/Fo value for 
the event as the AF/F value at the first time point when velocity was above zero 
before the peak time. (This ‘baseline’ AF/Fo for each event is similar to the baseline 
of the fluorescence trace (zero) except in cases when the event occurs during a decay 
of another event; see the last event in Extended Data Fig. 4b for such an example.) 
The start time of the event was then defined as the last time point before the peak 
time when the AF/Fo value is within noise level from the ‘baseline’ AF/Fo (‘noise’ 
being three times the standard deviation of the difference between the raw AF/Fo 
trace and the loess-smoothed AF/Fy trace.) An activity event trace was then con- 
structed which was zero except for frames with detected events, and each event was 
assigned an amplitude equal to the difference between the peak AF/F and the ‘base- 
line’ AF/Fo for that event. This eliminated the decay of the calcium signal’, but the 
use of velocity preserved events which occurred on top of the decays from other 
events. For inhibitory neurons, activity event traces were generated with the fol- 
lowing procedure. The fluorescence noise was defined as the mean of the absolute 
difference between the raw trace and a 1 s moving window loess smoothed fluores- 
cence trace. A low threshold of 1 times the noise and a high threshold of 3 times the 
noise were then set. Events were required to cross the high threshold. The start of 
an event was defined as the time when the fluorescence trace crossed the low thresh- 
old going up to capture the start of activity, and the end was defined as the time 
when the fluorescence trace crossed the high threshold going down. An activity 
event trace was then constructed which was zero at all frames except during detected 
events which were assigned values of the original AF/Fp for those frames. 


LETTER 


Classification of movement-related neurons. We observed higher levels of activ- 
ity during movement periods. In individual neurons, the average percentages of 
image frames that contained activity during movement periods versus non-movement 
periods were 0.68 + 0.03% versus 0.23 + 0.02% in excitatory neurons and 14.45 + 0.49% 
versus 6.95 + 0.27% in inhibitory neurons (mean + s.e.m.). Neurons whose activity 
was significantly higher during movement periods were classified as movement- 
related using the following procedure. The amount of activity during movement 
was calculated for each neuron as the mean value of the activity event trace during 
movement epochs (defined as described above, and extending individual epochs 
by 5 image frames before and after each movement). The movement trace was then 
shuffled (10,000 times) such that complete movement epochs were kept intact but 
their position in the trace and relation to each other was randomized. A measure of 
activity during these shuffled movement epochs was calculated in each shuffle as 
above. The neuron was classified as movement-related if the real value was higher 
than the 0.5 percentile value of the shuffled values. Classification using only rewarded 
movements instead of all detected movements gave nearly identical results. 

For analysing the longitudinal dynamics of the fractions of movement-related 
neurons over sessions (Fig. 2b), the fraction of movement-related neurons in each 
session in each animal was normalized to the highest and lowest values of the animal. 
The correlation was significantly positive for the data points in sessions 1-3 (r = 0.60, 
P<0.01) and significantly negative for sessions 4-14 (r = —0.36, P< 0.01). 
Population activity correlation. The stability of the population across days was 
assessed by correlation of population vectors (Fig. 2d). Each square represents the 
correlation coefficient of the excitatory neuron population activity vectors in a pair 
of sessions. The population activity vectors are the concatenation of the fraction 
of rewarded movements during which each neuron exhibited activity events. (For 
example, if neurons a, b and c are active in 20%, 70% and 30% of rewarded move- 
ments, respectively, then the population activity vector of the session is (0.2, 0.7, 0.3)). 
Activity onset timing analysis. We performed two analyses to define the timing 
of activity of movement-related excitatory neurons relative to movement onset. To 
define when the population activity diverged from baseline, the activity of each neuron 
was first averaged across movements. Then the population activity vector in each 
image frame was compared to the baseline activity (all non-movement periods 
from all sessions) by bootstrap (10,000 repetitions), yielding a P value for each 
image frame. We identified image frames in which P values were below 0.01 in at 
least five successive frames, and defined the first frame of those as the time of diver- 
gence. The time of first population activity divergence from baseline was 105 ms 
before movement onset (Extended Data Fig. 6d). Similarly, to identify the timing of 
activity modulation of individual neurons, the activity of a neuron in each image 
frame across movements was compared to its baseline activity (420-315 ms before 
movement onset) by bootstrap (10,000 repetitions). We defined the activity onset 
timing as the first image frame in which P values were below 0.01 in at least five 
successive frames. 9.2% of movement-related excitatory neurons showed signifi- 
cant activity before movement onset (Extended Data Fig. 6f). 

Analysis of movement-activity correlation. The relationship between movements 
and activity (Fig. 3) was analysed using 3 s (or 500 ms, Extended Data Fig. 7) starting 
from the onset of each rewarded movement, which was approximately the duration 
of rewarded movements for all animals over all sessions (2.62 + 0.02 s, mean + s.e.m.). 
The learned movement and activity patterns were created by averaging the lever 
traces and the activity of excitatory neurons, respectively, of a randomly chosen 
half of the trials of sessions 10-14, considered the expert stage. The trials used to 
generate the learned patterns were excluded from correlation analysis. The move- 
ment correlation for each trial was the correlation coefficient of the lever trace of 
the rewarded movement in that trial with the learned movement pattern. The activity 
correlation for each trial was the correlation coefficient between the concatenated 
activity time series of all excitatory neurons in the trial and the concatenated learned 
activity pattern. For results shown in Fig. 3b, the random choice of trials to define 
the learned patterns was repeated 1,000 times and the results were averaged. 

Dendritic spine dynamics. Dendritic spines were manually scored over the entire 14 
training sessions using a custom-written IGOR program (J. Boyd and K. Haas). Spine 
analysis was done in three dimensions and the criteria were as previous described”'. 
Analysis was done blind to the session number of each image, which was randomized. 
Weassumed that rapid ‘flickering’ of spines (elimination and immediate reformation, 
or formation and immediate elimination) is rare and corrected our blind scoring 
accordingly. While this corrected for mistakes in scoring, we may be slightly under- 
estimating spine dynamics. Specifically, ifa spine was scored as absent in one session 
(session X) and present in the immediately preceding (session X—1) and follow- 
ing (session X+ 1) sessions, then it was called present on session X. Furthermore, if 
a spine was scored as present in one session (session X) and absent in the imme- 
diately preceding (session X— 1) and following (session X+ 1) sessions, then it was 
called absent on session X. No more than one correction was applied on any given 
spine. Ifa spine score contained these gaps after one correction, that spine was excluded 
from following analyses. These exclusions were rare (4 of 191). 8% of spines were 
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corrected (16 of 191). The results closely matched those from independent scoring 
of the same data without shuffling dates (data not shown). 

Simultaneous two-photon guided cell-attached recordings and calcium 
imaging. Mice that had previously been used for calcium imaging were allowed 
full access to water before electrophysiology. On the day of the experiment, mice were 
anaesthetized with isoflurane and the glass window was removed and replaced with a 
glass half-window which was secured with superglue. The animals were then head- 
fixed in the imaging rig and allowed to recover from anaesthesia. Loose patch record- 
ings were performed with glass pipettes (~5-7 MQ) filled with 100 uM Alexa Fluor 
488 in saline. Excitatory neurons (negative for td Tomato) expressing GCaMP5G 
without fluorescence in the nucleus were targeted for recording. Signals were ampli- 
fied 500 by an Axon CNS amplifier (Molecular Devices), filtered at 2 kHz, recorded 
(Ephus) at 10 kHz, and synchronized to the start of image acquisition. In 4 out of 
6 neurons, imaging was done at the same zoom as the population imaging experi- 
ments (field of view 472 X 508 jm), and at three times higher zoom for the other 
two neurons. The results were similar at both zooms. 

Statistics. Non-parametric tests were used when possible to avoid assumptions about 
data distributions. Sample sizes were determined based on the statistical signifi- 
cance of our main findings, which is highly significant. Multiple comparisons were 
corrected for by Bonferroni corrections. Sample sizes (n) are as follows where appli- 
cable. Mice per session: 6, 6, 7, 7, 7, 7, 7, 7s 7» 7, 7, 6, 6, 5. Imaged, rewarded trials 
without movement at cue/total trials per session: 98/313, 173/526, 458/968, 488/ 
1,071, 438/966, 519/1,081, 424/946, 453/1,082, 320/784, 457/981, 481/889, 417/959, 
412/853, 280/702. Movement-related/all imaged excitatory neurons: 79/874, 118/874, 


202/1,122, 210/1,122, 201/1,122, 213/1,122, 181/1,122, 191/1,122, 159/1,122, 171/ 
1,122, 158/1,122, 136/995, 149/995, 89/843. Movement-related/all imaged inhib- 
itory neurons: 103/262, 117/262, 155/297, 167/297, 154/297, 133/297, 133/297, 124/ 
297, 106/297, 125/297, 119/297, 119/260, 114/260, 68/183. 
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Cue to movement onset 
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Extended Data Figure 1 | Behaviour. The fraction of rewarded trials is movement onset to reward decreases (P < 0.001, one-way ANOVA); inset, 
consistently high but the timing of behaviour improves during learning. zoom. d, The duration of each rewarded movement is stable throughout 
a, Fraction of trials that are rewarded. b, Time from cue onset to movement learning (P = 0.94, one-way ANOVA). Grey, individual mice; red, mean of all 
onset decreases (P < 0.001, one-way ANOVA); inset, zoom. c, Time from animals (a) or median of all trials (b-d). 
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Extended Data Figure 2 | Motor cortex is required for the lever-press task. 
a, Aspiration lesion of motor cortex impairs learning. Mice were allowed to 
recover for 14 days after lesion before training. Left: histological image showing 
lesion in the right motor cortex and quantification of lesion extents in four mice 
shown as a density map of the fraction of animals in which the area was 
lesioned. Anterior is to the top; lateral to the right. + denotes bregma. The white 
circle indicates the imaged area. Middle: average time from movement onset to 
reward throughout learning. This time is longer in mice with motor cortex 
lesion (P < 0.01, two-way ANOVA), indicating that the mice with a lesion are 
less efficient in their movements. Right: correlation of lever movements in all 
pairs of trials within each session throughout learning. This correlation is lower 
in the mice with a lesion (P < 0.001, two-way ANOVA), indicating that the 
mice with a lesion do not develop reproducible movements. b, Injections of 
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muscimol, a GABA receptor agonist, into the imaged area acutely impairs 
performance (control versus muscimol in motor cortex, **P < 0.01, Wilcoxon 
rank sum test). Muscimol injections in the barrel cortex had no significant effect 
(control versus muscimol in barrel cortex, P = 0.35, Wilcoxon rank sum test). 
Control, n = 18 sessions in 6 mice; barrel cortex, n = 6 sessions in 6 mice; 
motor cortex, n = 6 sessions in 6 mice. c, The imaged cortical area was acutely 
inactivated by stimulation of ChR2 in parvalbumin-expressing inhibitory 
neurons by blue light in interleaved 20% of trials (n = 10 sessions in 2 animals). 
This optogenetic inactivation of the imaged area impaired performance on a 
trial-by-trial basis (***P < 0.001, Wilcoxon rank sum test). Blue light had no 
effect on behaviour when the window was covered with opaque silicone (n = 4 
sessions in 2 animals). All error bars are s.e.m. 
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Extended Data Figure 3 | Optogenetic stimulation of the imaged area stimulation (P = 0.64, chi-squared test). ‘During light’, 1-s light stimulation; 
evokes forelimb movements in awake mice. a, Optogenetic excitation of the ‘before light’, 2 to 1 s before light onset, n = 40 ‘during light trials and 38 ‘before 
imaged area triggers forelimb movements in mice expressing ChR2 but not light trials in two ChR2 mice, 38 ‘during light’ trials and 38 ‘before light trials in 
in control mice not expressing ChR2 (P < 0.001, chi-squared test). ChR2 two control mice. b, Histological section showing the expression of ChR2 in 


expression does not alter spontaneous movement frequency in the absence of _ the motor cortex. Green, ChR2-YFP; blue, DAPI. 
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Extended Data Figure 4 | Simultaneous cell-attached recordings and two- _ simultaneously recorded action potentials (bottom: black vertical ticks; the 
photon calcium imaging in awake mice. a, Left: in vivo two-photon image of | numbers indicate the number of action potentials contained in each burst). 
motor cortex neurons expressing GCaMP5G. The neuron in the centre is Horizontal red lines at bottom indicate the duration of detected calcium events. 
targeted with a patch electrode. Right: after the recording session, voltage step | Note the precise temporal relationship between action potentials and calcium 
was applied to the electrode to activate the recorded neuron. The increased events. c, Table summarizing data from six neurons in two mice. Positive 
GCaMP5G fluorescence in the middle neuron confirms that the neuron offsets indicate the lag of the onset of detected calcium events relative to the first 
was indeed targeted. b, Example GCaMP5G fluorescence trace (top: black spike in the burst. The offset (7.1 + 41.4 ms) is on the order of the temporal 
indicates fluorescence trace and red indicates detected calcium events) and resolution of our imaging (~35 ms per image frame). 
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Extended Data Figure 5 | Lack of spatial clustering of movement-related 
excitatory neurons. Each plot represents one animal. Red dots show the mean 
pairwise distance between movement-related excitatory neurons. Solid and 
dotted black lines show the mean and 95% confidence intervals, respectively, 
obtained from shuffling the identities of movement-related neurons among all 
excitatory neurons 10,000 times. Dots below the lower dotted line would 
indicate significant clustering of cells, whereas dots above the upper dotted line 
would indicate the significant dispersion of cells (P< 0.05). 
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Extended Data Figure 6 | Additional analysis of population activity. 

a, Cumulative distribution of fraction of sessions classified as movement- 
related for inhibitory (red) and excitatory (green) neurons, showing the relative 
invariance of inhibitory neurons and dynamism of excitatory neurons 

(P< 0.001, Kolmogorov—Smirnov test). b, Movement-related excitatory 
neuron populations in each session compared to the previous session. Grey, 
fraction of neurons classified in the previous session; white, not classified in the 
previous session. A large number of newly movement-related neurons were 
added in the first few sessions (P < 0.001, comparison between sessions 2-4 
versus 10-14, Wilcoxon rank sum test). ¢, Fraction of excitatory neurons 
classified as movement-related in each session. Black, training (n = 7 mice, this 
is the data shown in Fig. 2b); red, no training (n = 6 mice). The expansion of 
movement-related neurons is specific to animals that underwent training 

(P = 0.74, sessions 1-2 combined; P < 0.001, sessions 3-7 combined; Wilcoxon 
rank sum test). d, Average population activity aligned to movement onset 
(black dotted line). Average activity (calcium event trace) of each 
movement-related excitatory neuron was averaged. The population activity 
diverged from baseline 105 ms before movement onset (red dotted line, 


Time (sec) 


Methods). e, Standard deviation of activity timing of individual 
movement-related excitatory neurons across sessions. Focusing on neurons 
that are classified as movement-related in three or more sessions, the standard 
deviation of activity onset timing relative to movement onset is plotted across 
sessions. Sessions were binned into one-third of the total number of sessions 
each neuron was classified. Activity timing became more stable on the 
neuron-by-neuron basis (r = —0.14, P < 0.001). f, Histogram of the time from 
movement onset that the activity of each movement-related neuron 
significantly diverged from baseline. 9.2% of movement-related excitatory 
neurons show significant pre-movement activity, a composition similar to a 
previous study'*. 82.7% of activity of movement-related neurons occurred 
during the periods between 105 ms before movement onset and movement 
offset (Methods). g, The cumulative fraction plot of the timing of all activity 
onsets of movement-related excitatory neurons during rewarded movements. 
Each group of sessions is shown as a line, with different colours representing 
different sessions. The distribution of activity onset timing during later sessions 
shifts towards the movement onset (P < 0.001, Kolmogorov-Smirnoy test for 
all three comparisons). All error bars are s.e.m. 
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Extended Data Figure 7 | Activity analysis focusing on the first 500 ms of 
each movement. For the activity analyses in the main figures that used the 
duration of 3 s after movement onset, we repeated the same analyses focusing 
on the first 500 ms of each movement (median time from movement onset 
to reward = 506 ms). This early activity shows progression throughout 
learning, similar to when activity over 3 s was considered. a, Standard deviation 
of the timing of activity onsets for movement-related excitatory neurons 

over sessions, indicating a gradual refinement of activity timing (r = —0.18, 
P<0.001). Neurons that were active in less than five trials of a given session 
were excluded from this analysis. The first bin contains only one data point and 
thus does not have an error bar. This analysis is equivalent to Fig. 2g. b, Pairwise 
trial-to-trial correlation of temporal population activity vectors increases 
with learning (r = 0.38, P< 0.001). Temporal population activity vector was 
defined as a concatenation of the activity traces of all movement-related 
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neurons and thus maintained temporal information within each movement. 
This analysis is equivalent to Fig. 2h. c, Correlation of spatiotemporal activity 
with the learned activity pattern is a function of the correlation of movement 
with the learned movement pattern in expert sessions. Movements similar 

to the learned movement pattern but made in naive sessions display activity 
very different from the learned activity pattern (P = 0.28 and <0.001 in the bins 
1 and 2-10, respectively, Wilcoxon rank sum test). This analysis is equivalent 
to Fig. 3b. d, Pairwise trial-to-trial correlation of temporal population 
activity vectors plotted as a function of movement correlation on those 

trials. A strong relationship between population activity and movement 
emerges during learning (P = 0.08, = 0.08, = 0.004, <0.001, = 0.002, 

<0.001, = 0.001, = 0.002, <0.001 and = 0.046 for each bin, Wilcoxon rank 
sum test). This analysis is equivalent to Fig. 3c. All error bars are s.e.m. 
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Extended Data Figure 8 | Dynamics of dendritic spines in the hindlimb area 
during learning of the lever-press task. Summary of dendritic spine dynamics 
in the hindlimb area during control period (7 days before training) and 
subsequent 7 days of training. Mice were water restricted in both conditions. 
Top: spine additions (black) and eliminations (grey) in each session. For 
control sessions, data from all sessions are combined. Bottom: total spine 
number across sessions. Values are normalized to the total spine number in 
session 1 in each condition. Unlike the forelimb area, the density of dendritic 
spines in the hindlimb area is relatively stable during learning (P = 0.07, 
comparisons between control versus training sessions 4-7, Wilcoxon rank sum 
test). All error bars are s.e.m. 
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Extended Data Figure 9 | Schematic of learning-related changes in the 
relationship of motor cortex activity and movement. Top: abstract space 
of activity patterns. Bottom: abstract space of movements. Circles in the 
movement space represent observed movements, and ovals in the activity space 
represent possible activity patterns that can lead to corresponding movements. 
Crosses and arrows represent example individual trials of activity-movement 
pairs. In naive animals, each trial involves variable activity and movement 
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patterns as illustrated by scattered crosses and multiple movements. In this 
stage, the relationship between activity and movement is inconsistent (that is, 
degenerate), such that same movement is derived from different activity 
patterns in different trials. During learning, this degeneracy is reduced and a 
reproducible spatiotemporal activity pattern emerges in the motor cortex 
that reliably generates the learned movement. This learned activity pattern 
(bold cross) is rarely, if at all, observed in naive stages. 
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The unfolded protein response governs integrity of 
the haematopoietic stem-cell pool during stress 


Peter van Galen’”, Antonija Kreso?, Nathan Mbong"?, David G. Kent’, Timothy Fitzmaurice’, J oseph E. Chambers”, 
Stephanie Xie’, Elisa Laurenti!?, Karin Hermans)”, Kolja Eppert®, Stefan J. Marciniak®, Jane C. Goodall’, Anthony R. Green’, 


Bradly G. Wouters’, Erno Wienholds? & John E. Dick? 


The blood system is sustained by a pool of haematopoietic stem cells 
(HSCs) that are long-lived due to their capacity for self-renewal. A 
consequence of longevity is exposure to stress stimuli including reac- 
tive oxygen species (ROS), nutrient fluctuation and DNA damage’”. 
Damage that occurs within stressed HSCs must be tightly controlled 
to prevent either loss of function or the clonal persistence of onco- 
genic mutations that increase the risk of leukaemogenesis**. Despite 
the importance of maintaining cell integrity throughout life, how 
the HSC pool achieves this and how individual HSCs respond to stress 
remain poorly understood. Many sources of stress cause misfolded 
protein accumulation in the endoplasmic reticulum (ER), and sub- 
sequent activation of the unfolded protein response (UPR) enables 
the cell to either resolve stress or initiate apoptosis**. Here we show 
that human HSCs are predisposed to apoptosis through strong acti- 
vation of the PERK branch of the UPR after ER stress, whereas closely 
related progenitors exhibit an adaptive response leading to their 
survival. Enhanced ER protein folding by overexpression of the co- 
chaperone ERDJ4 (also called DNAJB9) increases HSC repopula- 
tion capacity in xenograft assays, linking the UPR to HSC function. 
Because the UPR is a focal point where different sources of stress 
converge, our study provides a framework for understanding how 
stress signalling is coordinated within tissue hierarchies and inte- 
grated with stemness. Broadly, these findings reveal that the HSC 
pool maintains clonal integrity by clearance of individual HSCs after 
stress to prevent propagation of damaged stem cells. 

The human haematopoietic hierarchy has recently been delineated 
at the single-cell level, enabling precise isolation of HSCs and progenitor 
cells’? (Extended Data Table 1). Pathway analysis using gene expres- 
sion signatures from lineage-depleted umbilical cord blood popula- 
tions revealed enrichment of UPR components in HSCs compared to 
progenitor cells'® (Fig. la and Extended Data Fig. 1a). The UPR encom- 
passes the IRE1, PERK and ATF6 pathways® (Extended Data Fig. 1b). 
Several genes of the PERK signalling branch were more highly expressed 
in a mixed population of HSCs and progenitor cells (CD34* CD38~ 
HSPCs) compared to downstream progenitors (CD34* CD38") iso- 
lated from cord blood and adult bone marrow (Fig. 1b and Extended 
Data Fig. 1c-g). However, splicing of XBP1 messenger RNA, which is 
representative of IRE] activity, was lower in HSPCs compared to pro- 
genitors (spliced/total XBP1, Fig. 1b). Taken together, gene expression 
analysis of HSPC and progenitor fractions indicates differential activa- 
tion of UPR branches, with increased expression of PERK-dependent 
genes and decreased activity of IRE1 in HSPCs. 

To examine whether differential basal UPR gene expression reflects 
distinct ER stress responses in HSPCs and progenitors, we used two chem- 
ical inducers of ER stress: thapsigargin and tunicamycin. Thapsigar- 
gin disrupts Ca”* homeostasis in the ER, rapidly activating all three 
branches of the UPR". Treatment of sorted HSPCs and progenitors with 


thapsigargin resulted in upregulation of the canonical UPR target genes 
GRP94 (also called HSP90B1), GRP78 (HSPA5) and ERDJ4 (Fig. 1c and 
Extended Data Fig. 2a). XBP1 mRNA was rapidly spliced in progenitors 
but to a lesser extent in HSPCs. This indicates that differential XBP1 
splicing between HSPCs and progenitors under steady-state conditions 
is exaggerated upon thapsigargin-induced ER stress, consistent with 
repressed IRE1 splicing activity in HSPCs. 

Tunicamycin blocks synthesis of N-linked glycoproteins, causing accu- 
mulation of unfolded proteins in the ER’. Tunicamycin treatment 
resulted in higher upregulation of the canonical UPR genes GRP94, 
GRP78 and ERDJ4 in HSPCs compared to progenitors (Fig. 1d and Ex- 
tended Data Fig. 2b). Furthermore, upregulation of the PERK pathway 
constituents CHOP (also called DDIT3), ATF4and GADD34 (PPP1R15A) 
was higher in HSPCs compared to progenitors. In adult bone marrow, 
CHOP expression was also higher in HSPCs compared to progenitors 
after addition of a high tunicamycin dose (Extended Data Fig. 2c). Thus, 
basal enrichment of PERK pathway target genes in HSPCs is further 
amplified with tunicamycin treatment. 

Because persistent ER stress can lead to activation of apoptosis through 
signals downstream of the IRE1 and PERK branches of the UPR”, dif- 
ferential UPR branch activation between HSPCs and progenitors might 
influence cell fate outcome. Thapsigargin treatment did not result in 
survival differences between HSPCs and progenitors (Extended Data 
Fig. 3a). However, tunicamycin treatment significantly reduced HSPC 
viability and clonogenic capacity as compared to progenitors (Fig. 2a—c 
and Extended Data Fig. 3b, c). Activation of HSPCs into the cell cycle with 
cytokines reduced the survival difference with progenitors, indicating 
that tunicamycin sensitivity may be linked to the inherent quiescence 
of HSPCs (Extended Data Fig. 3d). By focusing on highly purified frac- 
tions, we found that tunicamycin caused selective loss by apoptosis of 
phenotypic CD34 "CD38 CD45RA CD90* HSCs (Fig. 2d and Extended 
Data Fig. 3e, f). Overall, these data indicate that tunicamycin-induced 
ER stress not only elicits distinct UPR signalling in HSCs compared to 
progenitors, but also causes selective apoptosis of HSCs. 

ER stress induces elF2 phosphorylation (peIF2«) by PERK, leading 
to global translational attenuation but, paradoxically, ATF4 and CHOP 
translation is increased'*!°, ATF4 and CHOP can induce apoptosis 
following prolonged ER stress, in part by upregulating the eIF2« phos- 
phatase GADD 34, leading to increased protein load through transla- 
tional recovery'*”” (Extended Data Fig. 1b). We investigated whether the 
increased apoptosis of HSCs compared to progenitors is linked to pref- 
erential PERK pathway activation. A lentiviral reporter vector was con- 
structed to measure the ATF4 translation rate’ (Fig. 3a and Extended 
Data Fig. 4a—c). As expected, the increased ATF4 reporter activation 
that occurs after tunicamycin treatment was inhibited by PERK inhibi- 
tion (measured by transgene ratio, Extended Data Fig. 4d). This activa- 
tion was more efficient in HSPCs compared to progenitors (Fig. 3b, c), 
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Medicine, University of Cambridge, Cambridge CB2 OXY, UK. Department of Pediatrics, McGill University and the Research Institute of the McGill University Health Centre, Westmount, Québec H3Z 223, 
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Figure 1 | Elevated expression of PERK branch genes of the UPR in 

HSCs compared to progenitors and further amplification after 
tunicamycin-induced stress. a, Forty UPR-related genes from the nodes in 
Extended Data Fig. 1a showed differential expression between HSCs and 
progenitors (false discovery rate (FDR) <0.05). CMP, common myeloid 
progenitor; GMP, granulocyte macrophage progenitor; MEP, megakaryocyte 
erythrocyte progenitor; MPP, multipotent progenitor; MLP, multilymphoid 
progenitor. b, Expression of key UPR genes in HSPC and progenitor fractions 
was measured by qPCR. Results are shown as mean ~ s.e.m. of n = 6 cord 


blood samples. c, d, UPR branch activation depends on cell type and stressor. 
Sorted HSPCs or progenitors were plated in the presence of thapsigargin (c) or 
tunicamycin (d). RNA was isolated at different time points to measure gene 
expression by qPCR. DMSO controls were the same between c and d. Data are 
shown as mean + s.e.m. of n = 3 cord blood samples; P value was calculated 
based on treated/control cells and indicates differential response between 
HSPCs and progenitors. *P < 0.05, **P < 0.01, ***P < 0.001, 

****D < 0.0001. 
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Figure 2 | HSCs are predisposed to apoptosis compared to progenitors after 
treatment with the ER stress agent tunicamycin. a, b, Lower survival of 
cord-blood- and bone-marrow-derived HSPCs compared to progenitors in the 
presence of tunicamycin. Sorted HSCs/HSPCs and progenitors were plated 
with 0.6 pg ml’ tunicamycin (a) or 3 pg ml tunicamycin (b). Symbols 
represent viable cell counts of individual samples where populations are 
connected by a black line; the blue line indicates mean + s.e.m. of n = 16 
cord blood samples and n = 5 bone marrow samples (a) or n = 7 cord blood 
samples and n = 5 bone marrow samples (b). c, Reduced clonogenic potential 


of HSCs compared to progenitors following tunicamycin treatment. HSCs 

or progenitors were sorted into methylcellulose containing DMSO or 
tunicamycin. Data are shown as mean + s.e.m. of n = 4 cord blood samples; 
NS, not significant. d, Tunicamycin treatment causes higher apoptosis in HSCs 
compared to progenitors. Cord blood cells were plated with tunicamycin 

and stained for primitive surface markers and Annexin-V/Sytox. 
Quantification of viable cells is shown as mean + s.e.m. of n = 5 cord blood 
samples. P values indicate different viability between HSCs and progenitors. 
*P < 0.05, **P < 0.01, ****P < 0.0001. 
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Figure 3 | HSCs are predisposed to UPR-induced apoptosis through 
PERK-elF20-ATF4-CHOP-GADD34 signalling. a, Bidirectional lentiviral 
reporter vector. All transduced cells are marked by TagBFP; GFP brightness 
measures the ATF4 mRNA translation rate, which is regulated by upstream 
open reading frames (uORFs) and depends on pelF2c (ref. 13). b, c, Higher 
ATF4 reporter activity in HSPCs compared to progenitors. HSPCs and 
progenitors were sorted, transduced with the ATF4 reporter and treated 

with tunicamycin. b, Representative flow plots outline calculation of the 
transgene ratio. MFI, mean fluorescence intensity. c, Results are shown as 
mean + s.e.m. of n = 4 cord blood samples. d, Constitutively active 
GADD34°F has a more pronounced effect on HSPCs compared to progenitors. 
Transduced HSPCs (left) or progenitors (right) were treated with tunicamycin. 
Symbols represent n = 3 cord blood samples where control (ctrl) and 
constitutively active GADD34° (ca~-GADD34°") groups are connected by a 
black line; P values were calculated using paired t-tests. e, f, Modulating the 
PERK pathway rescues HSCs from apoptosis. HSPCs and progenitors were 
plated with tunicamycin and the GADD34 inhibitor salubrinal (e) or the PERK 
inhibitor GSK2606414 (f). Viability is shown as mean + s.e.m. of n = 5 (e) or 
n= 4 (f) cord blood samples. *P < 0.05, **P < 0.01, ***P < 0.001. 


indicating that tunicamycin stimulates PERK pathway activity more 
strongly in HSPCs. As a second method to test PERK pathway involve- 
ment, we overexpressed constitutively active GADD34, which prevents 
peIF2a and upregulation of ATF4 and CHOP”. Expression of consti- 
tutively active GADD34 significantly increased survival of HSPCs, but 
not progenitors, after tunicamycin treatment (Fig. 3d). In a third inde- 
pendent approach, HSPCs and progenitors were treated with both tuni- 
camycin and salubrinal, which prevents el[F2« dephosphorylation”. 
Addition of salubrinal preferentially increased HSPC survival after tuni- 
camycin treatment (Fig. 3e). Finally, HSPCs and progenitors were treated 
with both tunicamycin and the PERK inhibitor GSK2606414 (ref. 20). 
Like salubrinal, GSK2606414 reduced the survival difference between 
HSPCs and progenitors (Fig. 3f). Thus, interfering with the PERK path- 
way at multiple junctions protects HSPCs from tunicamycin-induced 
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apoptosis and equalizes the survival between HSPCs and progenitors. 
Collectively, these data demonstrate that ER stress preferentially induces 
apoptosis of HSPCs compared to closely related progenitors through 
selective activation of the PERK branch of the UPR. 

We next asked whether the UPR was involved in regulating HSC 
function in vivo. CHOP is a main driver of apoptosis following PERK 
activation’"° and analysis of Chop” mouse bone marrow showed a 
small increase in the viability of mouse HSPCs (Extended Data Fig. 5a, 
b and Extended Data Table 2). This result indicates that Chop may be 
required for the survival/death balance of mouse HSPCs under physio- 
logical conditions. Next, we investigated whether enhanced ER protein 
folding would alter human HSC function. ERDJ4 increases the activity 
of the chaperone GRP78 and can associate with the ERAD machinery”’. 
These functions may enhance ER protein folding capacity and protect 
against UPR-induced apoptosis”. ERDJ4 expression was highest in 
CD49f* HSCs and reduced in all downstream progenitors (Fig. 4a). 
Green fluorescent protein (GFP)-marked ERDJ4 overexpression (ERDJ4°") 
lentiviral vectors were constructed to express different transgene levels 
(Fig. 4b, c). ERDJ4° conferred protection against tunicamycin-induced 
cell death in the TEX cell line as well as HSPCs (Fig. 4d and Extended 
Data Fig. 5c, d), suggesting that ERDJ4°" increases the threshold of ER 
stress needed to induce apoptosis. To test whether ERDJ4 influences 
human HSC function, transduced cord blood cells were transplanted 
into immune-deficient mice. ERDJ4° transduced cells had a compet- 
itive advantage as compared to the control group (Fig. 4e and Extended 
Data Fig. 5e). To measure directly the impact of ERDJ4°" on the num- 
ber of functional HSCs, in vivo limiting dilution analysis (LDA) was per- 
formed. At a low cell dose, ERDJ4°" resulted in higher engraftment 
compared to control transduction (Fig. 4fand Extended Data Fig. 5f, g). 
The LDA measurement demonstrated a 4.4-fold increase in the num- 
ber of repopulating HSCs on ERDJ4° (Fig. 4g). These data indicate 
that a protein folding factor classically associated with the UPR gov- 
erns HSC function in xenograft assays. 

To understand the mechanism of increased human HSC engraft- 
ment following ERDJ4°", we found that ERDJ4°" does not change pro- 
genitor engraftment, the frequency of phenotypic stem and progenitor 
cells, lineage differentiation, homing or self-renewal as measured by 
secondary LDA (Fig. 4h and Extended Data Fig. 6a—d). We proposed 
that ERDJ4°F might protect against ER stress that could occur during 
in vivo transplantation. Gene expression analysis of control-transduced 
cord blood cells indicated increased CHOP and GADD34 expression 
after transplantation, consistent with a stress response (Fig. 4i). With 
ERDJ4°£, this surge in CHOP and GADD34 expression was absent, 
indicating that ERDJ4°" prevents upregulation of stress-related genes. 
Transplantation of human HSCs in the xenograft environment places 
them under replicative stress, which causes elevated ROS, DNA damage, 
and loss of HSC function’. These processes may be connected to the 
UPR as ROS and DNA damage can cause ER stress’ and ROS accu- 
mulation leads to UPR-mediated apoptosis of HSCs™. The association 
between ER protein folding and HSC engraftment indicates that mod- 
eration of UPR activation may improve HSC survival during stem-cell 
transplantation. 

Our results establish a previously unrecognized link between UPR 
signalling and human HSC function. Owing to distinct activation of UPR 
branches upon stress exposure, HSCs are rapidly cleared while progenitors 
are spared. This response of human HSCs is consistent with the selec- 
tive induction of apoptosis after DNA damage or ROS accumulation*”». 
Collectively, these experimental observations suggest that HSCs pos- 
sess an intrinsic biological focus on preventing propagation following 
damage, reducing malignancy risk. Because terminal differentiation purges 
damaged progenitor cell progeny, clonal purity may be of less importance 
to progenitors. Loss of HSCs and intestinal stem cells in Grp78~ mice 
suggests that stem cells of multiple tissues can interrogate ER stress and 
use differential UPR activation to mitigate against potentially patholog- 
ical damage*®”’. Overall, our data point to the elimination of individual 
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Figure 4 | ERDJ4°* protects from tunicamycin-induced apoptosis and 
increases HSC output and frequency. a, ERDJ4 expression in sorted cord 
blood populations (Extended Data Table 1). P values were calculated in 
comparison to CD49f* HSCs; qPCR results are shown as mean + s.e.m. of 

n = 3 cord blood samples. B/NK, B and NK cell progenitor. b, c, Validation of 
lentiviral vectors for ERD]4°. b, Transduced cord blood cells were analysed 
by qPCR. PGK and SFFV refer to lentiviral promoter driving ERDJ4 expression. 
Results are shown as mean = s.e.m. of n = 2 cord blood samples. c, Transduced 
K562 cells were analysed by western blot. ERK2 is shown as a loading 
control. d, ERDJ4° protects from tunicamycin-induced apoptosis. 
Transduced HSPCs were treated with tunicamycin. Symbols represent n = 11 
cord blood samples where control and ERDJ4° groups are connected by a grey 
line; black line indicates mean + s.e.m.; P value was calculated using a paired 
t-test. e, ERDJ4°" confers a competitive advantage in vivo. Engraftment of 
transduced cord blood cells was analysed 20 weeks after injection. Data are 


HSCs after stress and damage as a paradigm of how the stem-cell pool 
maintains integrity, thereby ensuring long-term tissue maintenance. 


METHODS SUMMARY 


Umbilical cord blood cells were enriched for CD34" cells and sorted by fluorescence- 
activated cell sorting (FACS), cultured with small molecules, and/or transduced 
with lentivirus. Quantitative RT-PCR was performed using primer sequences in 
Extended Data Table 3. Apoptosis was assessed by Annexin- V/Sytox staining fol- 
lowed by flow cytometry. Human HSC repopulation was read out by intrafemoral 
transplantation of cord blood cells into immune-deficient mice. Unless otherwise 
stated, P values were calculated by two-tailed unpaired Student's t-test. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Cord blood and bone marrow sample preparation and liquid cell culture. Umbil- 
ical cord blood, bone marrow or mobilized peripheral blood from healthy indi- 
viduals was obtained with informed consent according to procedures approved by 
the institutional review boards of the University Health Network and Trillium 
Hospital. Mononuclear cells were obtained by centrifugation on Ficoll, and lin- 
eage depletion was performed using the StemSep Human Progenitor Cell Enrich- 
ment Kit according to the manufacturer’s protocol (50-75% CD34", StemCell 
Technologies). The Lin” cord blood or Lin” bone marrow/mobilized peripheral 
blood cells were stored in IMDM with 50% FCS and 10% DMSO at —150 °C until 
use. Unless stated otherwise, HSC/HSPC and progenitor fractions were sorted from 
cord blood. For liquid culture, cells were thawed and plated in X-VIVO 10 (Bio- 
Whittaker) supplemented with 1% BSA, 2mM L-glutamine, 100 U ml! penicillin- 
streptomycin, and the following cytokines: TPO (7.5 ng ml~ 1), SCE (50 ngml- 4, 
G-CSF (5 ng ml’), FIt3 ligand (50 ng ml ') and IL-6 (5 ng ml '), referred to as 
TSGF6 culture. The cell line K562 was expanded in IMDM with 10% FCS, 2 mM 
L-glutamine and 100 U ml penicillin-streptomycin; TEX cells were cultured as 
reported previously”*. Both cell lines tested negative for mycoplasma. 
Fluorescence-activated cell sorting and flow cytometry. To separate cell popu- 
lations, cord blood or bone marrow cells were re-suspended at 10” cells per ml, stained 
with antibodies against surface markers in PBS with 2% FCS, washed and sorted on 
the BD FACS Aria or MoFlo, consistently yielding >95% purity. Analytical flow 
cytometry was performed using BD LSRII or BD Canto cytometer. Data was ana- 
lysed with FlowJo software (Tree Star, Inc.). 

Annexin-V/Sytox and cleaved caspase-3 flow cytometry. For Annexin-V/Sytox 
apoptosis analysis, cells were stained for surface markers and washed in PBS with 
2% FCS, re-suspended in binding buffer (diluted 10 < in H,O, BD catalogue number 
556454) with Annexin-V-APC (50 X dilution, BD 550474) and Sytox Blue (500 x 
dilution, Life Technologies $34857) and stained for 20 min at room temperature. 
Then the sample was diluted 5 with binding buffer and cells were analysed by 
flow cytometry within 60 min. For cleaved caspase-3 analysis, cells were permeabi- 
lized for 30 min at room temperature with BD Perm 2 buffer (diluted 10 X in H2O, 
catalogue number 347692), washed in 2 ml PBS with 2% FCS and stained with PE- 
conjugated cleaved caspase-3 antibody from BD (catalogue number 561011) for 30 min 
at room temperature. Cells were washed again and re-suspended in BD Cytofix 
buffer (diluted 4 x in PBS, BD catalogue number 554655) before flow cytometry 
analysis. 

Quantitative RT-PCR. RNA was extracted from 500-100,000 cells using TRIzol 
(Life Technologies) and re-suspended in water for cDNA synthesis using the Super- 
Script III or SuperScript VILO systems according to manufacturer’s instructions 
(Life Technologies). For each qPCR reaction we added 2 X Power SYBR Green mix 
(Life Technologies), 133 nM forward primer and 133 nM reverse primer and RNase- 
free water up to a total volume of 12.5 pil. CDNA was diluted 6-20 with RNase- 
free water and 2.5 ul was added for each reaction. The qPCR was performed using a 
7900 HT Real-Time PCR system with SDS v2.3 software (Applied Biosystems) 
using standard settings: 50 °C for 2 min; 95 °C for 10 min; then 95 °C for 15s and 
60 °C for 1 min repeated for 40 cycles; then dissociation stage. Each assay was run 
in duplicate for technical variation. Arbitrary mRNA concentrations were calculated 
using the relative standard curve method. Gene expression levels were normalized 
to GAPDH except in Fig. 1b (normalized to the average of GAPDH, ACTB and PBGD) 
and Fig. 4a (normalized to the average of GAPDH and ACTB). To determine XBP1 
splicing levels, primers were designed that amplify all or only spliced XBP1 mRNA, 
and spliced XBP1 expression was divided by total XBP1 expression (indicated as 
spliced/total XBP1). qPCR primer sequences are listed in Extended Data Table 3. 
Methylcellulose colony-forming assays. Methylcellulose (StemCell Technologies 
MethoCult H4034) was supplemented with IL-6 (10 ng ml~ 1) Flt ligand (10 ng ml~ a 
and DMSO or tunicamycin (0.6 1g ml’). Using the BD FACS Aria, 500 CD34* 
CD38 CD45RA CD90* HSCs or 350 CD34* CD38" progenitors were deposited 
in 2.5 ml methylcellulose, and duplicate dishes were plated with 1 ml each (200 HSCs 
or 140 progenitors per dish). After 13 days, colonies were counted and classified based 
on morphological appearance. 

Lentiviral vectors. The bidirectional lentiviral MA1 vector” was modified by replac- 
ing ANGFR with a loxP-flanked Gateway cassette (Life Technologies) to generate 
the destination vector pMAL. To generate PGK-ERDJ4°", pMAL was recombined 
with the ERDJ4 entry vector from the Mammalian Gene Collection through the 
PlasmID Repository at Harvard” (Clone ID HsCD00076069). To generate consti- 
tutively active (ca) GADD34°*, the GADD34 fragment was amplified by PCR from 
the pLV-ca-GADD34 construct” using the forward primer 5’-CACCATGGCCA 
GTGTGCTGG-3’ and the reverse primer 5’- TCACTGGGAAGGGAAGAAGG-3’, 
cloned into an entry vector using the pENTR Directional TOPO cloning kit (Life 
Technologies K2400-20), and this entry vector was recombined with pMAL. To gen- 
erate SFFV-ERD]4°, the PGK promoter of PGK-ERDJ4°* was replaced with the 
stronger SFFV promoter*’. Control (Ctrl) vectors expressed a humanized Renilla 
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luciferase gene or a Stuffer sequence derived from pLKO.1_1.9Kb_stuffer instead 
of ERDJ4 and ca-GADD34. The ATF4 reporter lentivirus was made as follows: 
first, the PGK promoter in pMAL was replaced with the stronger SFFV promoter 
and GFP was replaced with TagBFP (Evrogen”’) to generate the destination vector 
pSMALB. Next, the reporter fragments were amplified from ATF4.5: 5’ ATF4.GFP 
(Addgene 21852), ATF4.12: 5’ATF4.uORF14™“.GFP (Addgene 21859) and ATF4. 
14: 5'ATF4.uORF1&2°V" GFP (Addgene 21861) using the forward primer 5’-C 
ACCAGCTTTTCTGCTTGCTGTC-3’ and the reverse primer 5’-TTACTTGTA 
CAGCTCGTC CATGC-3’. These fragments were cloned into entry vectors using 
the pENTR Directional TOPO cloning kit. Finally, the entry vectors were recombined 
with pSMALB to generate the bidirectional lentiviral reporter vectors pSMALB- 
ATF4.5rep (referred to as ATF4 reporter), pSMALB-ATF4.12rep and pSMALB- 
ATF4.14rep, respectively. In the ATF4 reporter, mRNA expression highly correlates 
between TagBFP and ATF4—GFP due to the bidirectional promoter”. To account 
for differences in basal translation between experimental conditions (such as HSPCs 
versus progenitors, high versus low transduction, DMSO versus tunicamycin treat- 
ment), we calculated the transgene ratio between GFP and TagBFP as a measure of 
reporter activity (TGR = GFP mean fluorescence intensity/TagBFP mean fluores- 
cence intensity, Fig. 3b). Reporter fluorescence was measured 30 h after addition of 
tunicamycin. For in vitro experiments, transduced cord blood cells were cultured 
for 3 days before treatment to allow for gene expression to come up. 

Lentivirus production and primary cell transduction. Viral particles were pseu- 
dotyped with the vesicular stomatitis virus G (VSVG) protein using the pMD.G 
vector and third-generation pMDLg/pRRE and pRSVRev vectors were used for 
packaging in 293T cells using calcium-phosphate transfection (Clontech catalogue 
number 631312). Lentivirus was concentrated 100 by ultracentrifugation, resus- 
pended in X-VIVO 10 (BioWhittaker) supplemented with 1% BSA and stored at - 
80 °C until use. Cord blood cells were thawed and plated in liquid TSGF6 culture 
with double the concentration of all cytokines, and lentiviral suspension was added 
at a multiplicity of infection of 5-20 in a total volume of 100 pl (96 well plate) or 
400 kl (24 well plate). After 16 h, TSGF6 culture medium was added to expand the 
cells. Transduction was measured using flow cytometry for GFP or TagBFP after 
72-96 h. 

Mouse xenotransplantation and human Lin’ cell isolation. Mouse xenografts 
were performed as described previously*’ according to protocols approved by the 
University Health Network Animal Care Committee. Briefly, 8-12-week-old male 
NOD/Lt-scid/IL2Rynull (NSG) mice were sublethally irradiated (225 cGy) 1 day 
before injection. Cells were injected intrafemorally with 30 ,1l PBS. Peripheral blood 
(80 ul) was taken from the saphenous vein and analysed by flow cytometry. Mice 
with >40% human T cells in the peripheral blood are likely to harbour an auto- 
immune clonal T-cell expansion and were excluded from analysis. To reach statis- 
tical significance, all animal studies were repeated with 3 cord blood samples with 
at least 2, but generally 5, mice per condition. Animals were not randomized before 
injection, no blinding was done for animal studies. After the mice were euthanized, 
femurs were flushed with 2 ml PBS 2% FCS; 50 yl was stained for surface markers 
and analysed by flow cytometry. Unless stated otherwise, engraftment data for injected 
femur is shown. For lineage depletion, bone marrow cells from the femurs and 
tibias of 5 mice were combined and processed with the StemSep Mouse/Human 
Chimera Enrichment kit (Stem Cell Technologies) according to the manufacturer's 
instructions. However, during the antibiotin incubation step, an additional 50 ,l ml~ 
human haematopoietic progenitor enrichment antibody cocktail from the StemSep 
Human Progenitor Cell Enrichment Kit was added to deplete human lineage-positive 
cells. 

In vivorepopulation and serial limiting dilution analysis. For competitive repop- 
ulation experiments, cord blood cells were transduced and ~50,000 cells were 
injected per mouse the next day. After 20 weeks, the mouse bone marrow was ana- 
lysed for human CD45~ engraftment as well as GFP and lineage differentiation. 
The median percentage of GFP* cells within the human CD45™ graft was normal- 
ized to initial transduction for PGK-Ctrl and PGK-ERDJ4°" groups. For limiting 
dilution analysis (LDA) in primary recipients, transduced cord blood cells were 
expanded for 10 days in liquid TSGF6 culture and GFP* cells were sorted and in- 
jected at low and high cell doses. Human CD45*GFP* engraftment was analysed 
10 weeks after transplantation. LDA in secondary recipient mice was performed 
by sorting GFP cells from pooled bone marrow of primary mice. Secondary mice 
were injected with 30,000 to 1,000,000 cells and analysed 10 weeks after transplan- 
tation; a secondary mouse was scored as positive if it had >0.01% human engraft- 
ment. HSC frequency was estimated using the ELDA software™* (Extreme Limiting 
Dilution Analysis, http://bioinf.wehi.edu.au/software/elda/). Progenitor engraftment 
was tested by overnight transduction of sorted CD34* CD38" progenitors and injec- 
tion of 8,700-14,000 cells per mouse. The percentage of human CD45 *GEP* cells 
in the mouse bone marrow after 2-4 weeks was normalized to initial transduction. 
To assess gene expression changes on transplantation, mice were killed at 19h, 
1 week and 2 weeks after transplantation of 1.0-1.6 X 10° transduced cord blood 
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cells. Human GFP* CD34" cells were sorted from the xenografted mouse bone 
marrow (at 19h and 1 week after transplant, all GFP" cells were sorted due to cell 
number constraints). 

Analysis of Chop’ mouse bone marrow. Chop’ and wild-type littermates on 
the C57BL/6 background were housed in a specific pathogen-free facility according 
to the Animals Scientific Procedures outlined by the UK. Femurs, tibias and pelvises 
of 13-week-old male mice were flushed to isolate bone marrow cells. Five to ten 
million cells were stained with antibodies against surface markers (HSC panel: lin- 
eage markers, c-Kit, Sca-1, CD34 and Flk2; progenitor panel: lineage markers, 
c-Kit, Sca-1, CD34 and FcyII/IIIR, Extended Data Table 2). Next, cells were stained 
using Annexin-V Pacific Blue conjugate (Biolegend) and 7-AAD according to man- 
ufacturer’s instructions, and analysed on a BD Fortessa cytometer. 

Tunicamycin, thapsigargin, salubrinal and GSK2606414. Compounds were pur- 
chased as follows: tunicamycin, Sigma-Aldrich, catalogue number T7765; thapsi- 
gargin, Sigma-Aldrich, catalogue number T9033; salubrinal, Santa Cruz, catalogue 
number $C202332; GSK2606414, Merck Millipore, catalogue number 516535. 
Powder was re-suspended in DMSO and stored at —20°C until use. Final DMSO 
concentration was always <0.6% and equal between treatment and control groups. 
Unless otherwise indicated, cell counts and viability analyses were performed after 
40h of treatment. Viable cells were counted manually by Trypan blue exclusion or 
automated using the BD Canto flow cytometer high throughput sampler (HTS), by 
counting the number of Annexin” and/or Sytox™ cells in a specified volume. 
Western blot. Transduced K562 cells were lysed, separated with SDS-PAGE and 
transferred onto a polyvinylidene fluoride membrane as previously reported**. Spe- 
cific antibody to ERDJ4 (Abnova catalogue number H00004189-M09) was detected 
using secondary HRP-conjugated antibodies (Amersham) and visualized by chemi- 
luminescence (Pierce). 

Gene expression and pathway analysis. Gene expression data sets were reported 
previously’’. The genes upregulated in HSC compared to progenitor (MLP/CMP/ 
GMP/MEP) with adjusted P value <0.01 were used for gene ontology (GO) analyses 
with BiNGO*. The algorithm was used with hypergeometric test, multiple test 
correction (Benjamini-Hochberg false discovery rate (FDR)) and using the whole 
Homo sapiens annotation as a reference set. Data were visualized with Cytoscape*; 
gene sets linked to the UPR response were unlinked from the rest of the network for 
presentation. For the heat map, we checked the expression of all genes belonging to 
GO categories relative to the UPR response (79 genes belonging to GO_0006986, 


GO_0034620, GO_0034976, GO_0030968 categories). Forty of these were differen- 
tially expressed between HSC and progenitors (FDR <0.05). Their expression levels 
in HSC, MPP, MLP, CMP, GMP and MEP were mean centred. 

Statistical analysis. Unless otherwise stated, mean + s.e.m. values are given and 
P values were calculated by two-tailed unpaired Student’s t-test. Mann-Whitney 
U-tests were performed to compare engraftment levels, as these data do not show 
a normal distribution. ELDA software was used for statistical analysis of in vivo 
LDA* (http://bioinf.wehi.edu.au/software/elda). *P < 0.05, **P < 0.01, ***P < 0.001, 
**E*D < 0.0001. 
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Extended Data Figure 1 | Expression analysis of UPR-related genes. 

a, Enrichment of UPR-related genes in human HSCs compared to 
progenitors. CD49f* HSC-enriched genes were analysed for GO category 
overrepresentation. Node size represents the number of genes; white, yellow 
and orange colour correspond to FDR <0.15, <0.1 and <0.01. b, Simplified 
scheme illustrating UPR signalling events. Three branches of the UPR are 
activated upon ER stress: IRE1, PERK and ATF6. IRE1 splices cytosolic XBP1 
mRNA to enable translation of the XBP1s transcription factor, which 
upregulates chaperones and ER-associated degradation (ERAD) machinery to 
resolve ER stress*”**. PERK initiates a different branch of the UPR through 
phosphorylation of elF2«, which attenuates global protein synthesis, thus 
permitting time to restore ER homeostasis”’. Prolonged ER stress leads to PERK 
signalling-mediated upregulation of the proapoptotic transcription factor 
CHOP and its target GADD34. GADD34 dephosphorylates eIF2a leading to 
restoration of global protein translation. However, if ER stress is not resolved, 
GADD34 upregulation can lead to further accumulation of misfolded proteins, 
oxidative stress and apoptosis'®. Yellow highlighted arrows indicate 


transcriptional regulation. c, Amplification curves of qPCR reactions for 
UPR-related genes. Fluorescence signal during 40 cycles of qPCR reactions 
on cord-blood-derived cDNA is shown for a representative experiment. Green 
line indicates threshold that was used to calculate mRNA quantity. 

d, Dissociation curves were generated to check for the presence of aspecific 
amplicons or primer dimers, which would be visible as additional peaks. Each 
line represents the dissociation curve of one qPCR reaction, colours indicate 
different genes. e, Slopes and R? values of standard curves are shown for a 
representative experiment. These values were calculated separately for each 
experiment, based on a cDNA dilution series. c-e were performed using SDS 
v2.3 software. f, Agarose gel analysis of qPCR amplicons. qPCR reactions 
were run on a 3% agarose gel to check for reaction specificity: nonspecific 
amplicons would be visible as additional bands. The expected product size is 
shown above the gel; the ladder sizes are indicated on the right. g, Adult bone 
marrow cells were sorted into HSPC and progenitor fractions. mRNA levels 
for CHOP and ERDJ4 were measured by qPCR. Results are shown as 

mean ~ s.e.m. of n = 5 bone marrow samples. ****P < 0.0001. 
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Extended Data Figure 2 | Differential response of HSPCs and progenitors —_ blood samples, P value was calculated based on fold change of treated over 
to ER-stress-inducing agents. a, b, HSPC and progenitor fractions were sorted | DMSO control cells and indicates differential response between HSPCs and 


and plated in the presence of (a) thapsigargin or (b) tunicamycin. mRNA progenitors. c, Adult bone marrow HSPCs and progenitors were sorted and 
was isolated after 0.5, 1, 6, 16 and 40 h and expression levels of GRP78, ERDJ4, _ plated in the presence of tunicamycin. After 16h, mRNA was isolated 
GADD34 and ATF4 were assessed by qPCR. The DMSO-treated controls and expression levels of CHOP, ERDJ4 and GRP94 were assessed by qPCR. 


were the same between a and b. Data are shown as mean + s.e.m.ofn=3cord Data are shown as mean + s.e.m. of n = 5 bone marrow samples. 
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Extended Data Figure 3 | Survival of HSCs is lower compared to 
progenitors after tunicamycin, but not thapsigargin treatment. 

a, Thapsigargin has similar toxicity for sorted HSC and progenitor fractions. 
Sorted HSCs and progenitors were plated in TSGF6 culture conditions in the 
presence of thapsigargin or DMSO control. Symbols represent viable cell 
counts of individual samples where fractions are connected by a black line; the 
blue line indicates mean + s.e.m. of nm = 7 cord blood samples. b, c, Reduced 
clonogenic capacity of sorted HSCs compared to progenitors after tunicamycin 
treatment. Total colony number is shown in Fig. 2c. Here, data are separated 
into colony types based on morphological appearance. Data are shown as 
mean + s.e.m. of n = 4 cord blood samples. G, granulocyte; M, macrophage; 
GM, granulocyte/macrophage; BFU, erythroid burst forming unit; mix, 
multilineage. d, HSCs have lower survival compared to progenitors after 
tunicamycin treatment, even after cell cycle induction. Sorted HSC and 


progenitor fractions were plated in TSGF6 culture conditions with double 
cytokine concentrations for 72-96 h to induce GO exit of the HSC fraction”. 
Then, cells were plated in the presence of tunicamycin. Viable cell counts as a 
percentage of DMSO controls are shown. Symbols represent individual samples 
where fractions are connected by a black line; the blue line indicates 

mean + s.e.m. of n = 5 cord blood samples at 0.6 pg ml! and n = 3 cord blood 
samples at 3 pg ml! tunicamycin. e, f, Increased apoptosis of HSCs compared 
to progenitors after tunicamycin treatment. e, Cord blood cells were plated with 
tunicamycin and stained for primitive surface markers, Annexin-V and Sytox. 
Representative flow plots are shown. f, Sorted HSCs and progenitors were 
plated in the presence of tunicamycin. The percentage of viable Annexin-V — 
cells after 40 h compared to DMSO controls is shown as mean + s.e.m. of n = 4 
cord blood samples. **P < 0.01, ***P < 0.001, ****P < 0.0001. 
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Extended Data Figure 4 | ATF4 reporter enables visualization of increased 
ATF4 translation after tunicamycin treatment. a, ATF4 reporter validation. 
Two upstream ORFs (uORFs) that are 5’ of the ATF4 coding sequence in 

the ATF4 mRNA ensure more efficient translation of ATF4 when elF2a 
phosphorylation levels are high'*"*. A bidirectional lentiviral vector was 
constructed that gives constitutive expression of TagBFP to mark transduced 
cells. In the other direction, the SFFV promoter drives expression of the 5’ end 
of the ATF4 mRNA which fuses with a GFP reporter gene 3’ of the termination 
codon of uORF2. HeLa cells were transduced with pSMALB-ATF4.5rep 
(referred to as ATF4 reporter) and treated with tunicamycin. After 30 h, GFP 
fluorescence was read out by flow cytometry. Histogram plots show n = 2 


i Dvso 
I Tunicamycin 3 yg mr" 


Hi Dmso 
IX Tunicamycin 0.6 pg mi 


technical duplicates (two black lines for DMSO control, two red lines for 
tunicamycin treatment). b, c, Reporter fluorescence depends on uORFs. 
HeLa cells were transduced and treated with tunicamycin. As expected, 
ATF4-GFP translation is (b) repressed in the negative control that has a 
mutated uORF1 start codon and (c) constitutively high in the positive control 
with mutated start codons for both uORFs'’. Histogram plots show n = 2 
technical duplicates. d, ATF4 reporter-transduced cord blood cells were treated 
with tunicamycin and increasing doses of the PERK inhibitor GSK2606414. 
The transgene ratio (TGR) is shown as mean + s.e.m. of n = 6 cord blood 
samples (except at 600 nM, n = 3 cord blood samples). *P < 0.05, **P < 0.01, 
*EEED < 0.0001. 
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Extended Data Figure 5 | Modulation of UPR-associated genes affects 
haematopoietic stem and progenitor cells in vivo. a, Analysis of 
haematopoietic stem and progenitor cell frequencies in Chop”~ mice. Flow 
cytometry was performed on mouse bone marrow (Extended Data Table 2). 
Bars show the absolute cell production in each population from wild-type or 
Chop mice. Data are shown as mean + s.d. of n = 5 mice per group. 

b, Viability analysis of stem and progenitor cell populations in Chop" mice. 
The percentage of viable Annexin-V-7-AAD° cells within the HSC-enriched 
LSK and Lin Sca-1"c-Kit" progenitor fractions was assessed by flow cytometry. 
Data are shown as mean = s.d. of technical duplicates of n = 5 mice per group. 
CG ERDJ40" cells show increased survival after tunicamycin treatment. 

The haematopoietic TEX cell line** was transduced with SFFV-Ctrl or 
SFFV-ERDJ4”* lentiviral vectors and plated in the presence of 0.6 ugml | 
tunicamycin (SFFV refers to lentiviral promoter driving transgene expression). 
After 48 h, the number of transduced cells compared to DMSO-treated controls 
was determined by automated counting of GFP* cells. Data are shown as 
mean + s.d. of n = 3 independent experiments, P value was calculated using a 
paired t-test. d, Tunicamycin-induced apoptosis is reduced by ERDJ4°". Cells 


from c were analysed for Annexin-V and cleaved caspase-3 expression by flow 
cytometry. Data are shown as mean + s.d. of n = 3 independent experiments, 
P values were calculated using paired t-tests. e, ERDJ4° endows cord blood 
cells with a competitive advantage over untransduced cells. Three cord blood 
pools (Exp. 1-3) were transduced with PGK-Ctrl or PGK-ERDJ4” lentiviruses 
and injected into 5 mice each. Dashed line indicates GFP% after transduction 
(day 0); solid line indicates median GFP% of the human CD45" graft in the 
injected femur of xenografted mice (20 weeks). Every symbol represents one 
mouse. f, Similar expansion of PGK-Ctrl and PGK-ERDJ40" transduced cord 
blood cells in vitro. Three cord blood pools (Exp. 1-3) were transduced with 
PGK-Ctrl or PGK-ERDJ4° lentivirus and expanded for 10 days in liquid 
culture. Total population doublings of transduced GFP” cells is shown. 

g, ERDJ4°° increases HSC output. After liquid culture, GFP* cells from f were 
sorted and injected at high and low cell doses, indicated below the x axis. Total 
human CD45*GEFP* engraftment in the injected femur after 10 weeks is 
shown. P values were calculated using the Mann-Whitney U-test. Every symbol 
represents one mouse, line shows median. *P < 0.05, **P < 0.01, 

***P < 0,001. 
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Extended Data Figure 6 | Lineage differentiation, progenitor cell 
frequencies, homing, and serial transplantability are maintained following 
ERDJ4°¥, a, PGK-ERDJ4°"-transduced cord blood maintains multilineage 
differentiation potential in vivo. Left: gating scheme to assess differentiation of 
the human graft in mouse bone marrow. Representative flow plots show 
quantification of CD45*CD19* B cells, CD45*CD33~ monocytes and 
granulocytes, and CD45 GlyA* erythroid cells within the GFP” graft. Right: 
the differentiation of transduced cord blood cells was assessed in the peripheral 
blood (PB) at 10 and 20 weeks and in the injected (RF) and non-injected 
(LF) femur at 20 weeks after transplantation. Results are shown as 

mean + s.e.m. of n = 15 mice representing n = 3 cord blood samples. 

b, ERDJ4°* does not cause aberrant expansion of stem or progenitor cell 
fractions. To assess the distribution of human stem and progenitor cells, 
lineage’ and mouse cells were depleted from xenografted mouse bone marrow. 
The remaining human lineage” cells were analysed by flow cytometry. Left: 
gating scheme to assess differentiation into HSC, MPP, MLP, CMP/MEP and 
B/NK/GMP fractions (Extended Data Table 1). Right: the frequency of human 


stem and progenitor cells within the human CD45*GEFP* graft was assessed 
20 weeks after transplantation of transduced cord blood cells. Results are shown 
as mean + s.e.m. of n = 3 cord blood samples. c, Homing capacity to the 
non-injected bone marrow is not altered by ERDJ4°". Transduced cord blood 
cells were expanded for 12 days in liquid culture conditions and 1-1.6 X10° cells 
were transplanted per mouse. After 19h, mice were euthanized to assess 
human CD45" GFP* cell homing to the non-injected femur. Results were 
normalized to transduction efficiency. Every symbol represents one mouse, 
results of n = 3 cord blood samples are shown with 2 mice per group each; line 
shows median. d, Frequency of functional human HSCs in vivo is maintained 
with ERDJ4°". Cord blood cells were transduced and injected into primary 
mice. After 10 weeks, mice were killed and transduced GFP* cells were sorted 
from their bone marrow. Thirty thousand to one million cells were re- 
transplanted into secondary mice for serial LDA. After 10 weeks, the bone 
marrow of secondary mice was assessed for human CD45* GFP* engraftment; 
mice were scored as positive if the engraftment level was >0.01%. Data from 
n = 3 cord blood samples was pooled. 
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Extended Data Table 1 | Surface marker phenotypes to separate 
human stem and progenitor cell subsets 


Population 
HSPC 

HSC 

CD49f* HSC 
MPP. 
CD49f MPP 
MLP 
Progenitor 
CMP/MEP 
B/NK/GMP 
CMP 

MEP 

B/NK 

GMP 


Surface phenotype 

CD34*°CD38” 

CD34*CD38 CD45RA CD90* 
CD34*CD38 CD45RA CD90°CD49f* 
CD34*CD38° CD45RA CD90" 
CD34°CD38 CD45RA CD90 CD49f 
CD34*CD38 CD45RA*CD90~° 
CD34*CD38* 

CD34*CD38°CD45RA 
CD34*CD38*°CD45RA* 
CD34*CD38*CD45RA CD10°CD135* 
CD34*CD38*°CD45RA CD10 CD135 
CD34*CD38*CD45RA*CD10* 
CD34*CD38*CD45RA*CD10 CD135° 


HSPC, haematopoietic stem and progenitor cell; HSC, haematopoietic stem cell; MPP, multipotent 
progenitor; MLP, multilymphoid progenitor; CMP, common myeloid progenitor; MEP, megakaryocyte 
erythrocyte progenitor; B/NK, B and NK cell progenitor; GMP, granulocyte macrophage progenitor. 
“‘HSPC’' to ‘HSC’ to ‘CD49f* HSC’ indicates increasing purity of the population: approximately 1:75, 1:20 


and 1:10, respectively”®. 
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Extended Data Table 2 | Surface marker phenotypes to separate 
mouse stem and progenitor cell subsets 


Population Surface phenotype 

LSK Lin"Sca-1*c-Kit® 

LT-HSC Lin’Sca-1*c-Kit*Flk2°CD34- 
ST-HSC Lin" Sca-1*c-Kit*FIk2°-CD34* 
MPP. Lin"Sca-1*c-Kit*Flk2°CD34* 
Progenitor Lin"Sca-1"c-Kit* 

CMP Lin’Sca-1"c-Kit*FeylI/IIIR°CD34* 
GMP Lin"Sca-1"c-Kit*FeylI/IIIR"CD34* 
MEP Lin’Sca-1"c-Kit"FcylI/IIR-CD34— 


LSK, HSC-enriched Lin’Sca-1*c-Kit* cells; LT-HSC, long-term haematopoietic stem cell; ST-HSC, 
short-term haematopoietic stem cell; MPP, multipotent progenitor; CMP, common myeloid progenitor; 
GMP, granulocyte macrophage progenitor; MEP, megakaryocyte erythrocyte progenitor. 
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Extended Data Table 3 | Primer sequences used for quantitative 


RT-PCR 


Gene 

ACTB 

ACTB 

ATF4 

ATF4 

ATF6 

ATF6 

CHOP/DDIT3 
CHOP/DDIT3 
ERDJ4/DNAJB9 
ERDJ4/DNAJB9 
ERO1LB 

ERO1LB 
GADD34/PPP1R15A 
GADD34/PPP1R15A 
GRP78/HSPAS/BiP 
GRP78/HSPAS/BiP 
GRP94/HSP90B1/TRA1 
IRE1/ERN1 
IRE1/ERN1 
PERK/EIF2AK3 
PERK/EIF2AK3 
PBGD/HMBS 
PBGD/HMBS 
Spliced XBP1 
Spliced XBP1 

Total XBP1 

Total XBP1 


Primer 
forward 
reverse 
forward 
reverse 
forward 
reverse 
forward 
reverse 
forward 
reverse 
forward 
reverse 
forward 
reverse 
forward 
reverse 
Qiagen 
forward 
reverse 
forward 
reverse 
forward 
reverse 
forward 
reverse 
forward 
reverse 


Sequence 5' to 3° 
CCTGTACGCCAACACAGTGC 
ATACTCCTGCTTGCTGATCC 
GCTAAGGCGGGCTCCTCCGA 
ACCCAACAGGGCATCCAAGTCG 
ATGAAGTTGTGTCAGAGAACC 
CTCTTTAGCAGAAAATCCTAG 
GGAGCATCAGTCCCCCACTT 
TGTGGGATTGAGGGTCACATC 
TCGGCATCAGAGCGCCAAATCA 
ACCACTAGTAAAAGCACTGTGTCCAAG 
TTCTGGATGATTGCTTGTGTGAT 
GGTCGCTTCAGATTAACCTTGT 
CCCAGAAACCCCTACTCATGATC 
GCCCAGACAGCCAGGAAAT 
TGACATTGAAGACTTCAAAGCT 
CTGCTGTATCCTCTTCACCAGT 
Qiagen cat. number QT00046963 
TGCTTAAGGACATGGCTACCATCA 
CTGGAACTGCTGGTGCTGGA 
AATGCCTGGGACGTGGTGGC 
TGGTGGTGCTTCGAGCCAGG 
CATGTCTGGTAACGGCAATG 
GTACGAGGCTTTCAATGTTG 
CGCTTGGGGATGGATGCCCTG 
CCTGCACCTGCTGCGGACT 
GGCATCCTGGCTTGCCTCCA 
GCCCCCTCAGCAGGTGTTCC 
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Human embryonic-stem-cell-derived 
cardiomyocytes regenerate non-human primate hearts 


James J. H. Chong!?*4°+, Xiulan Yang’, Creighton W. Don®, Elina Minami'*>*°, Yen-Wen Liu”, Jill J. Weyers'?”, 
William M. Mahoney Jrb?5, Benjamin Van Biber’, Savannah M. Cook’, Nathan J. Palpant’®, Jay A. Gantz>?>® 

James A. Fugate*, Veronica Muskheli+*, G. Michael Gough’, Keith W. Vogel’, Cliff A. Astley, Charlotte E. Hotchkiss’, 
Audrey Baldessari’, Lil Pabon!*°, Hans Reinecke’, Edward A. Gill®, Veronica Nelson'®, Hans-Peter Kiem”, 


Michael A. Laflamme!?° & Charles E. Murry>?>°® 


Pluripotent stem cells provide a potential solution to current epi- 
demic rates of heart failure’ by providing human cardiomyocytes 
to support heart regeneration’. Studies of human embryonic-stem- 
cell-derived cardiomyocytes (hESC-CMs) in small-animal models 
have shown favourable effects of this treatment*-’. However, it remains 
unknown whether clinical-scale hESC-CM transplantation is fea- 
sible, safe or can provide sufficient myocardial regeneration. Here 
we show that hESC-CMs can be produced at a clinical scale (more 
than one billion cells per batch) and cryopreserved with good viabi- 
lity. Using a non-human primate model of myocardial ischaemia 
followed by reperfusion, we show that cryopreservation and intra- 
myocardial delivery of one billion hESC-CMs generates extensive 
remuscularization of the infarcted heart. The hESC-CMs showed pro- 
gressive but incomplete maturation over a 3-month period. Grafts 
were perfused by host vasculature, and electromechanical junctions 
between graft and host myocytes were present within 2 weeks of en- 
graftment. Importantly, grafts showed regular calcium transients 
that were synchronized to the host electrocardiogram, indicating 
electromechanical coupling. In contrast to small-animal models’, non- 
fatal ventricular arrhythmias were observed in hESC-CM-engrafted 
primates. Thus, hESC-CMs can remuscularize substantial amounts 
of the infarcted monkey heart. Comparable remuscularization of a 
human heart should be possible, but potential arrhythmic compli- 
cations need to be overcome. 

Human pluripotent stem cells have indisputable cardiomyocyte- 
generating abilities and have been extensively investigated for repair 
of the injured heart***"°. These stem cells are derived either from deve- 
loping blastocytsts (human embryonic stem (ES) cells) or from repro- 
grammed somatic cells (induced pluripotent stem cells (iPSCs))’’. 
Although iPSCs have promising therapeutic potential’*, a number of 
factors are likely to slow their regulatory approval’. Human ES-cell 
derivatives, on the other hand, are already being tested in humans for 
retinal diseases and spinal cord injury'*’*. These indications require 
small numbers of differentiated cells, ranging from 10* to 10’. By con- 
trast, cardiac repair will require orders of magnitude more cells, because 
a billion cardiomyocytes are lost after a typical infarct”. At present it is 
unknown whether this large-scale production of hESC-CMs is feasible. 
Furthermore, it remains unclear whether the favourable cardiac repair 
findings in small-animal models will be reproduced in more clinically 
relevant large-animal models. As an important translational step towards 
creating a viable clinical therapy, we investigated the ability of exogen- 
ously delivered hESC-CMs to engraft and electrically couple to host 
myocardium in a non-human primate model of myocardial infarction. 


Notably, this model provides a heart size and rate more comparable to 
the human. 

Extrapolating results from our previous studies in smaller mam- 
mals, where 10” cardiomyocytes were required in mice, 10’ in rats and 
10° in guinea pigs®*"°, we reasoned that sufficient engraftment in the 
larger non-human primate heart required delivery of 1 X 10” cells. 
Feasibility of this large-scale hESC-CM delivery requires cryopreser- 
vation of cells, which we validated in an established immunodeficient 
mouse model of myocardial infarction’’. Similar to previous reports’®, 
we found no adverse impact of cryopreservation on hESC-CM graft 
size (Extended Data Fig. 1). Therefore, delivery of cryopreserved hESC- 
CMs seems to bea sound strategy for large-scale transplantation in large 
animals or humans. 

We previously used zinc-finger nuclease (ZFN)-mediated gene tar- 
geting to create hESC-CMs (H7 parental ES-cell line) stably expressing 
the genetically encoded fluorescent calcium indicator GCaMP3 from 
the AAVS1 locus’ (Extended Data Fig. 2a). These were used to prove 
exogenously delivered hESC-CMs could electrically couple to the host 
heart in a guinea pig model of myocardial infarction’. For the first two 
non-human primate experiments we used this same cell line. Routine 
karyotyping after two experiments revealed duplication of the long 
arm of chromosome 20 (Extended Data Fig. 3a). Reanalysis of two 
previous karyotypes from this line revealed this subtle duplication to 
be present in cells delivered to both monkeys. As the effect of this 
abnormality on hESC-CM engraftment and function is unknown, we 
created another karyoptyically normal GCaMP3 human ES-cell line 
for comparison. The ZEN approach was again used to target the GCaMP3 
construct to the AAVS1 locus (Extended Data Fig. 2a) in Rockefeller 
University embryonic stem cell line 2 (RUES2) human ES cells. Sou- 
thern blotting revealed correct targeting of the construct (Extended 
Data Fig. 2b) and karyotyping was normal after expansion (Extended 
Data Fig. 3b). For both of these GCaMP3 ES-cell lines we used our 
well-established monolayer protocol of directed differentiation (as des- 
cribed earlier) to produce a high yield of cardiomyoctes*. Flow cyto- 
metry was used to assess cardiomyocyte purity, and the hESC-CMs 
used in these studies were 73 + 12% positive for cardiac troponin T 
(cInT; Extended Data Fig. 4). Spontaneous beating was observed 
in vitro for hESC-GCaMP3-CMs with robust fluorescence with each 
contractile cycle (Supplementary Videos 1 and 2). 

Seven pigtail macaques (Macaca nemestrina) were used for the study 
without randomization (Table 1). Myocardial infarction was created 
by ischaemia followed by reperfusion using a percutaneous balloon 
catheter 2 weeks before hESC-CM delivery, with immunosuppression 
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Table 1 | Macaque characteristics with morphometry and calcium imaging summary 


Animal Sex Age Body Heart LV weight Treatment Endpoint Infarct Infarct Graft Graft size Graft coupled 
identifier weight (kg) weight (g) (g) mass (g) size (% LV) mass (g) (%LV) (%) 

P2 F  10y6m 8.6 39 23.3 No-cell control 2 weeks (sham) L7 73 N/A N/A N/A 

P3 F  lly8&m 9.2 38 14.9 H7-GCAMP3- CM 2 weeks (cells) 0.8 5.3 0.2 La 100 

P4 M 9y5m 9.5 37 19.9 H7-GCAMP3- CM 4 weeks (cells) 19 95 0.2 OF 100 

P5 M 6y 123 52 20.7  RUES2-GCAMP3-CM 4 weeks (cells) 0.5 25 1.1 5.3 100 

P6 M 5y 9.7 48 29.3 RUES2-GCAMP3-CM_ 12 weeks (cells) Li 3.7 0.3 1.0 100 

P7 F 14y 8.4 36 19.5 No-cell control 4 weeks (sham) 2.0 10.4 N/A N/A N/A 


LV, left ventricle; N/A, not applicable. 


starting 5 days before cell delivery (see Methods and Extended Data 
Fig. 5). hESC-CMs were delivered into the infarct region and surround- 
ing border zones under direct surgical visualization using a method 
optimized to aid cell retention (Extended Data Fig. 6). All macaques 
underwent full necropsy after being euthanized. Consistent with our 
previous results®*"°, no macroscopic or microscopic evidence of tera- 
toma or other tumour was detected, and human cells were not identified 
outside the heart. All macaques had patchy transmural myocardial 
infarctions. Infarct sizes in sham-treated hearts (at 14 and 28 days after 
engraftment) were 7.3 and 10.4% of the left ventricle (Table 1), whereas 
infarcts in cell-treated hearts ranged from 3.7-9.5% of the left ventricle 
(mean of 5.2 + 1.5%; Table 1). All hESC-CM-treated monkeys showed 
extensive remuscularization of the infarct areas (Fig. la-g and Extended 
Data Fig. 7). Graft size, calculated on the basis of green fluorescence 
protein (GFP) expression, ranged from 0.7-5.3% of the left ventricle 
(mean of 2.1+ 1.1%; Table 1), averaging 40% of the infarct volume. 
Greater than 98% of engrafted human cells expressed the sarcomeric 
protein o-actinin (Extended Data Fig. 8a), indicating that almost all 
graft cells were cardiomyocytes. Furthermore, these hESC-CMs showed 
increased maturation from day 14 to day 84, as evidenced by increased 
myofibril alignment, sarcomere registration and cardiomyocyte diameter 
(Fig. 2a—c and Extended Data Fig. 8b-f). As these conclusions are drawn 
from small animal numbers per time point (n = 1 each for day 14 and 
day 84, n = 2 for day 28), maturation will require further validation. 
The cardiomyocyte diameter of day 84 grafts was 10.9 + 2 1m, approxi- 
mately the size of normal adult monkey cardiomyocytes (10.1 jum) and 
approaching the 11-13 1m diameter seen in normal adult human hearts”’. 
Additionally, a maturation gradient was apparent, with cardiomyo- 
cytes at the edge of grafts exhibiting greater maturation than those 
within the central core (Fig. 2f-k). There were frequent host-graft con- 
tacts (Fig. 1g) where nascent intercalated disks formed and expressed 


GFP (human) a-Actinin (human + monkey) Nuclei 


the adherens junction protein N-cadherin and the gap junction protein 
connexin 43. From day 14 to day 84 the expression of these junctional 
proteins increased substantially (Fig. 2d, e, |-q). Few CD3* T lymp- 
hocytes or CD20* B lymphocytes were found within or around the 
hESC-CM grafts, suggesting that our immunosuppression successfully 
prevented graft rejection (Extended Data Fig. 9). 

hESC-CM grafts were perfused by host vessels, as evidenced by 
anti-CD31 immunostaining without GFP co-expression (Fig. 1h, i). 
Microcomputed tomography was used to image the three-dimensional 
structure of the coronary vasculature, which was correlated to aligned 
histological sections, permitting analysis of coronary anatomy within 
the graft, scar and remote myocardium (Fig. 3a-d and Supplemen- 
tary Video 3). Graft and scar regions were integrated into the three- 
dimensional vascular network, revealing arteries and veins supplying 
the hESC-CM graft that were connected to the host system. This shows, 
to our knowledge for the first time, that large hESC-CM grafts are suc- 
cessfully perfused by host vasculature and are viable long term. 

To investigate electromechanical coupling of hESC-CM grafts to the 
host, hearts from all macaques were subjected to ex vivo fluorescent 
imaging using a modified Langendorff perfusion system (Supplemen- 
tary Video 4). Hearts were perfused with 2,3-butanedione monoxime 
(BDM, a myosin crossbridge inhibitor) to uncouple electrical cardio- 
myocyte excitation from mechanical contraction. This removed con- 
founding motion artefacts and prevented indirect graft activation by 
passive stretching. Epicardial fluorescent calcium transients were seen 
in all hESC-CM-treated hearts, indicating electrical activation of the 
cardiomyocyte grafts (Fig. 4a-d and Supplementary Videos 5, 6). Fur- 
thermore, 100% of the visible hESC-CM grafts in every monkey showed 
electromechanical coupling to the host heart (Table 1). Graft-host coupl- 
ing was evidenced by epicardial fluorescent transients that were syn- 
chronous with the host electrocardiogram (ECG) QRS complexes during 


Figure 1 | Remuscularization of the infarcted 


GFP (human) CD31 (human + monkey) Nuclei 


274 | NATURE | VOL 510 | 12 JUNE 2014 
©2014 Macmillan Publishers Limited. All rights reserved 


macaque heart with human cardiomyocytes. 
a-i, Confocal immunofluorescence of macaque 
hearts subjected to myocardial infarction and 
transplantation of hESC-CMs. Grafts were studied 
at day 14 (a-g) and day 84 post-engraftment 
(h-i). a, Remuscularization of a substantial portion 
of the infarct region (dashed line) with hESC-CMs 
co-expressing GFP. The contractile protein 
a-actinin (red) is expressed by both monkey and 
human cardiomyocytes. Scale bar, 2,000 jum. 

b-f, Images from the peri-infarct region of the 
same heart shown in a, demonstrating extensive 
hESC-CM engraftment. Scale bars: 1,000 1m (b-e); 
200 tm (f). g, Graft-host interface (arrows) at 
day 14 with interconnected -actinin- (red) 
expressing cardiomyocytes (arrows). Note that 
host sarcomeric cross-striations (asterisks) show 
greater alignment than hESC-CM graft. Scale 
bars, 25 jim. h-i, Day 84 hESC-CM grafts contain 
host-derived blood vessels lined by CD31* 
endothelial cells. Scale bars, 20 tum. Inset scale bar, 
10 pm. 
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Figure 2 | Human cardiomyocyte grafts mature 
with time from engraftment. a, Cardiomyocyte 
diameter of hESC-CMs shows significant increase 
from 14 (n = 1) to 28 (nm = 2) days and from 28 
to 84 (n = 1) days after engraftment. Adult 
monkeys (nm = 2). From each animal 200-400 cells 
were counted from three histological sections at 
varied left ventricular levels. Mean + standard 
error of the mean (s.e.m.) is shown. b-q, Confocal 
immunofluorescence of macaque hearts subjected 
to myocardial infarction and transplantation of 
hESC-CMs 14 days (b, d, f-h, I-n) or 84 days 


spontaneous depolarization (Fig. 4e). hESC-CM grafts retained 1:1 
coupling to host myocardium during atrial pacing at rates of up to 240 
beats per minute, the highest rate tested (Fig. 4f-h). 


a Large arteries feeding the graft 
im Other vessels in myocardium 


Small graft vessels 
Outline of graft along slice surface 


Figure 3 | Blood vessels extend from the host coronary network into the 
graft. a—-c, Three-dimensional rendered microcomputed tomography 

of heart perfused with Microfil at 3 months after hESC-CM injection. 

b, Higher-power view of boxed area from a. c, Cross-sectional cut plane 
through the heart at the location of the dotted line in a. Arteries perfusing the 
graft are red, other vessels are grey in the uninjured cardiac tissue, or white 
within the graft. The vessels within the graft are better visualized in 
Supplementary Video 6. d, A histological section of the heart shown in a—c was 
immunostained with an anti-GFP antibody to mark the hESC-CM graft 
(brown). This section corresponds to the same location of the cross-sectional 
cut plane in c. Black dots are Microfil within coronary vessels. 


(c, e, i-k, o-q) after engraftment. P3 and P6 are 
animal identifiers. Increased myofibril content, 
sarcomere alignment and cardiomyocyte size in 
hESC-CMs (GFP*) are seen in longer-term grafts 
(b, c). Connexin43 (CX43) expression is not 
evident in hESC-CM grafts at 14 days but is seen 
at 84 days (d, e). Cardiomyocytes at the edges of 
grafts (g, j, m, p) show greater maturation compared 
with those at the central core (h, k, n, q), as 
evidenced by increased size, «-actinin staining 
intensity, sarcomere alignment (g-k) and 
N-cadherin expression (m-q). Scale bars for panels 
f, i, land 0, 100 um. All other scale bars, 20 um. 
Yellow and white boxes correspond to higher-power 
fields of graft edge and core, respectively. 


To explore the electrophysiological consequences of our hESC-CM 
grafts, we analysed ECGs obtained by telemetry from the time of in- 
farction until death. Continuous ECG recordings were taken regularly 
and 24 h periods (midnight to midnight) were analysed. Control maca- 
ques with myocardial infarctions and sham (vehicle only) injections 
maintained normal sinus rhythm with heart rates of 100-130 beats per 
minute throughout the experiment (Fig. 5a). No arrhythmias were 
noted in hESC-CM-treated monkeys during the period after myocar- 
dial infarction but before hESC-CM delivery (Fig. 5e-h). By contrast, 
all macaques that received hESC-CMs showed arrhythmias. These 
included premature ventricular contractions and runs of ventricular 
tachycardia (defined as wide QRS complex (>60 ms) with rate >180 
beats min” '; Fig. 5c). Frequent wide QRS complex rhythms with rates 
similar to baseline (accelerated idioventricular rhythm; Fig. 5b) were 
also observed. Notably, all animals remained conscious and in no dis- 
tress during all periods of arrhythmia. 

To investigate left ventricular function, we performed trans- 
oesophageal echocardiography before myocardial infarction, before 
hESC-CM delivery and immediately before the end of the experiment 
(Extended Data Fig. 10b). Multiple trans-oesophageal and deep trans- 
gastric views were analysed by cardiologists blinded to experimental 
details. We were unable to obtain images of sufficient quality for ana- 
lysis from one control animal. The other control demonstrated a decline 
in ejection fraction after myocardial infarction that was unchanged after 
sham cell injection. The hearts receiving hESC-CMs showed variable 
responses, some exhibiting an increased ejection fraction after treat- 
ment and others showing no improvement. Owing to the small group 
sizes, no statistically significant effects were noted. 

These experiments demonstrate that hESCs can be grown, differ- 
entiated into cardiomyocytes and cryopreserved at a scale sufficient 
to treat a large-animal model of myocardial infarction. With further 
refinements in manufacturing, the scale up to trials in human patients 
seems feasible. Large-animal models are important forerunners to 
human trials, because they impart real-world rigour to issues such as 
cell production, delivery and end-point analyses, while permitting 
mechanistic studies not possible in patients'*"”. We observed extensive 
remuscularization of the infarcts in all animals, with grafts averaging 
40% of infarct mass. Importantly, all of the human cardiomyocytes 
showed complete electrical coupling to the primate heart and responded 
normally to pacing up to 240 beats per minute (the fastest rate attempted). 
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Figure 4 | Human cardiomyocytes are electrically coupled 1:1 to the 
infarcted host macaque heart after transplantation. a, Diagram showing 
regions of the infarcted macaque heart visualized in b-d. Analysis shown is 
from ex vivo imaging 14 days after hESC-CM delivery. b, Still image from 
low-power fluorescence video showing regions of hESC-CM engraftment (red 
and blue rectangles). ¢, d, Still images of calcium indicator GCaMP3-positive 
hESC-CM grafts (bottom left of panel b) during diastole (c) and systole (d). 


The coupling seen in this study was greater than that observed in our 
guinea pig model, where only 60% of recipient hearts had grafts that 
were synchronized with the host’. This enhanced coupling may have 
resulted from the use of an ischaemia-reperfusion model, which gives 
patchier infarcts with more peninsulas of viable host tissue than the 
guinea pig cryo-injury model. 

Our previous studies in mice”’, rats** and guinea pigs’ gave no evid- 
ence of arrhythmias after hESC-CM engraftment, whereas here we con- 
sistently observed arrhythmias. There are several possible mechanisms 


HHH ST TAATAALAL AASV 


Note the gain of fluorescence during systole. e-h, GCaMP3 fluorescence 
intensity (arbitrary units (AU)) and ECG versus time for the grafted regions of 
interest shown in b. e-h, Each graft region shows 1:1 coupling synchronous 
with host ventricular contraction (ECG QRS complex) during spontaneous 
rhythm (e) or atrial pacing (f-h). All hESC-CM grafts identified in every 
transplanted animal showed 1:1 coupling. 


for the observed arrhythmias, including re-entrant circuits or graft auto- 
maticity””~’. Further studies are required to distinguish between these 
possibilities. The most likely reasons why arrhythmias were observed in 
monkeys but not in smaller animals seem to be differences in heart size 
and rate. Regarding size, the larger hearts of adult macaques (37-52 g) 
compared with the hearts of mice (0.15 g), rats (1 g) and guinea pigs 
(3g) allows for more hESC-CMs to be delivered, and the resultant 
grafts are approximately tenfold larger than the largest obtained in 
other species’. Ventricular depolarization over integrated but relatively 
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Figure 5 | Ventricular arrhythmias after hESC-CM transplantation. 

a-d, Representative traces from macaque telemetric ECG recordings showing 
normal sinus rhythm (SR; a), accelerated idioventricular rhythm (AIVR; 

b), ventricular tachycardia (VT; c) and non-sustained VT (NSVT; d). Scale bar, 
1s. e-h, Frequency of arrhythmias is highest within the first 2 weeks after 
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hESC-CM transplantation. P2—7 designations are animal identifiers. Animals 
receiving vehicle only (no cells, P2 and P7) remained in SR throughout. 
Interrupted y-axis in e, f denotes reduced number of episodes but increased 
total duration of arrhythmias (VT or AIVR for more than 18 h per 24h period). 
NSAIVR, non-sustained accelerated idioventricular rhythm. 


©2014 Macmillan Publishers Limited. All rights reserved 


immature hESC-CM grafts may slow conduction of the overall wave 
front. Although not problematic over short distances in small grafts, 
over longer distances (in large grafts) this may favour formation of 
re-entrant loops. It is noteworthy that the animal (P5) with the largest 
hESC-CM graft size also had the highest frequency of arrhythmia. 
Another important factor is the species-specific heart rates (maca- 
ques 100-130 beats min ' versus guinea pigs 230 beats min ', rats 
~400 beats min’ and mice ~600 beats min '), Faster spontaneous 
rates will favour ventricular capture from native conduction pathways 
rather than graft automaticity or re-entrant loops, and this would pro- 
bably prevent sustained ventricular arrhythmias. These factors are 
relevant to clinical translation given that the human heart is larger 
(300 g) with a slower basal rate (70 beats min ~ ') than that of macaques. 
The principal limitations of this study are the small numbers of 
animals used and their relatively small infarct sizes. Both limitations 
stem from the high cost and value of the primate model. Consequently, 
we cannot determine with statistical certainty that the observed 
arrhythmias directly result from transplanted hESC-CMs. Larger studies 
will be required to assess this and the treatment effects on cardiac func- 
tion. Importantly, infarct sizes in this study were smaller than the clin- 
ically severe infarcts that might benefit most from hESC-CM therapy. 
Larger infarcts, in human hearts, might manifest more arrhythmias. 
Because ventricular arrhythmias can be life threatening, they need to 
be understood mechanistically and managed en route to safe clinical 
translation. Nevertheless, the extent of remuscularization and electro- 
mechanical coupling seen here encourages further development of human 
cardiomyocyte transplantation as a clinical therapy for heart failure. 


METHODS SUMMARY 


Human ES cells were differentiated into cardiomyocytes by induction with activin 
A and BMP4, as previously reported”*. To enhance engraftment cardiomyocytes 
were subjected to heat shock followed by treatment with a pro-survival cocktail 
before cryopreservation. GCaMP3-positive human ES cells were generated by ZFN- 
mediated targeting to the AAVS1 locus, following methods described previously’. 
Details of mouse and macaque procedures are provided in Methods. Microcomputed 
tomography was performed as previously described** with minor modifications. 
All procedures complied with the regulations of and were approved by the Uni- 
versity of Washington Institutional Animal Care and Use Committee. 
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METHODS 

Cell preparation. Undifferentiated H7 (ref. 11) or RUES2 human ES cells”* were 
expanded using mouse embryonic fibroblast-conditioned medium (MEF-CM)”* 
supplemented with basic fibroblast growth factor (R&D Systems). The H7 line was 
obtained from the WiCell Research Institute and the RUES2 line from Rockefeller 
University. Both lines were regularly karyotyped and tested for mycoplasma. 
Human ES cells were then differentiated into cardiomyocytes using a previously 
reported directed differentiation protocol. Briefly, activin A (R&D Systems) and 
bone morphogenetic protein 4 (BMP4, R&D) are applied to defined, serum-free, 
monolayer culture conditions*”®. hESC-CMs were collected and cryopreserved 
after 16-20 days of CM differentiation. One day before collection, cells were 
subjected to a pro-survival ‘cocktail’ (PSC) protocol, previously shown to enhance 
engraftment after transplantation®. Briefly, cultures were heat-shocked with a 
30 min exposure to 43 °C medium, followed by RPMI-B27 medium supplemented 
with IGF1 (100ng ml‘, Peprotech) and cyclosporine A (0.2 mM, Sandimmune, 
Novartis). One day later, cultures were collected with 0.25% trypsin per 0.5 mM 
EDTA (Invitrogen) and cryopreserved as described previously’. Immediately 
before transplantation, cells were thawed at 37 °C, washed with RPMI, and sus- 
pended in 1.5 ml volume (per animal) of modified PSC consisting of 50% (v/v) growth- 
factor-reduced Matrigel, supplemented with BCL-xl BH4 (cell-permeant TAT 
peptide, 50 nM, Calbiochem), cyclosporine A (200 nM, Wako), IGF1 (100 ng ml 7 
Peprotech) and pinacidil (50 mM, Sigma). 

Generation of the GCaMP-reporter human ES-cell line. A transgene encoding 
for the constitutive expression of GCaMP3 was inserted into the AAVS1 locus in 
H7 and RUES2 human ES cells, using methods adapted from a previous study” 
(see Extended Data Fig. 2). In brief, the right and left arms of an AAVS1-specific 
ZEN were de novo synthesized (Genscript) and cloned into a single polycistronic 
plasmid in which the expression of each was driven by an independent human 
PGK promoter. A second polycistronic vector was generated in which (approxi- 
mately 800bp) homology arms flanking the AAVS1 ZEN cut site (pZDonor, 
Sigma Aldrich) surrounded a 5.1 kb insert with two elements: a cassette in which 
the CAG promoter drives expression of GCaMP3 (Addgene, plasmid #22692) and 
a second cassette encoding for PGK-driven expression of neomycin resistance. 
AAVS1 ZEN (5 mg) and AAVSI CAG GCaMP3 targeting vector plasmids were 
co-electroporated (Lonza, Nucleofection system) into human ES cells cultured in 
MEF-CM supplemented with 10 mM Y-27632. Green fluorescent colonies were 
isolated and expanded and selected with 40-100 pg ml’ G418 (Invitrogen) for 
5-10 days. 

Southern blot analysis. Wild-type and transgenic GCaMP3-positive human ES- 
cell genomic DNA were digested with the restriction enzymes Ndel and Nhel, run 
on 1% polyacrylamide gel and transferred to a membrane (BioRad Zeta Probe). 
The membrane was washed in 2 SSC and dried at 80 °C in a hybridization oven 
for 2 h, followed by 1 h of pre-hybridization in 50% formamide, 0.12 M NaH2P04, 
0.25 M NaCl, 7% SDS and 1 mM EDTA at 43 °C. A genomic probe was generated 
using the following primers: GGAGGTGGTGCGCTTCTTGG (forward), CGC 
ATCCCCTCCCAGAAAGAC (reverse), and neomycin cassette probe: ATGGGA 
TCGGCCATTGAACAAG (forward), GAAGAACTCGTCAAGAAGGCG (reverse). 
The probes were labelled with p32 dCTP (Amersham Megaprime DNA labelling 
system) and hybridized overnight in hybridization buffer at 43 °C. After 24h, the 
membrane was washed for 20 min with 2X SSC/0.1% SDS followed by 20 min in 
0.1X SSC/0.1% SDS. The membrane was then exposed to autoradiographic film 
for 3 days. 

Animal models. All procedures complied with and were approved by the Uni- 
versity of Washington Animal Care and Use Committee. 

Mouse surgery. Male, SCID-BEIGE mice of 8 weeks age, (Taconic Farm) were 
anaesthetized with Avertin, intubated and ventilated before undergoing thora- 
cotomy and ligation of the left anterior descending artery. Immediately after 
ligation 1 X 10° hESC-CMs (freshly isolated or cryopreserved, allocated in a non- 
randomized and unblinded manner) in a volume of 5 jl was injected directly into 
the infarct region and surrounding border zones. Five days after myocardial infarc- 
tion creation mice were euthanized and hearts collected for analysis for detection 
of human cell grafts by previously described methods’ (see later and Extended 
Data Fig. 1). 

Non-human primate surgery. M. nemestrina (8.6-12.3 kg, Washington National 
Primate Center) of either sex were used for these experiments. Ages are specified 
in Table 1. Macaques first underwent a 2-week period of acclimation and training 
to wear a mesh jacket to prevent removal of intravenous (i.v.) catheter. Five days 
before myocardial infarction, macaques were treated with amiodarone 100 mg daily 
with feed 5 days before myocardial infarction and continued for a further 10 days after 
myocardial infarction. For all major surgery macaques were anaesthetized with keta- 
mine and propofol, intubated and ventilated using sevoflurane to maintain anaes- 
thesia. Fentanyl and buprenorphine were administered to provide perioperative and 
postoperative pain relief. Before each major surgery trans-oesophageal echocardiography 


was performed using a Phillips HD-11XE ultrasound machine with an S7 2 MHz 
trans-oesophageal probe. 

Before myocardial infarction creation, an iv. lidocaine bolus 1 mg kg” ' and infu- 
sion 20 pg kg"! min”! was used to prevent ventricular arrhythmias. Heparin was 
delivered i.v. to maintain activated clotting times of 250-350 to prevent throm- 
bosis. Under fluoroscopic guidance a 5F coronary catheter was used to engage the 
left main coronary artery. A guide wire and angioplasty balloon was passed into the 
mid-left anterior descending artery and the balloon inflated for 90 min. Myocardial 
infarction was confirmed by ST segment elevation on ECG and by subsequent 
serum assays for cardiac troponin and creatine kinase. For telemetric monitor 
implantation a CTA-D70 (Data Sciences International) transmitter was placed 
subcutaneously over the abdomen with leads tunnelled subcutaneously in a modi- 
fied lead II configuration. Immune suppression was achieved by methylpredniso- 
lone i.v. 500 mg on the day before hESC-CM delivery then maintenance doses of 
0.1-1.5mgkg™' until monkeys were euthanized, cyclosporine to maintain serum 
trough levels of 200-250 gl" * from 5 days before hESC-CM delivery until maca- 
ques were euthanized and Abatacept (CTLA4 immunoglobulin) 12.5mgkg * on 
the day before hESC-CM and every 2 weeks thereafter. To prevent opportunistic 
infections broad-spectrum antibiotics and anti-fungal agents were administered. 

On day 14 after myocardial infarction, macaques were anaesthetized and under- 
went left thoracotomy. The heart was exposed and a pericardial cradle created. The 
infarct region was directly visualized and hESC-CMs were delivered intramyocar- 
dially into the infarct region and adjacent border zones via 15 injections each of 
100 tl volume. Needle tips were placed within a preformed mattress suture, and 
three injections were delivered via the same epicardial puncture, changing the 
trajectory of the needle for each. Before withdrawal of the needle the mattress 
suture was closed around the needle tip to facilitate cell retention. For control 
macaques, an equal volume of PSC-RPMI vehicle was injected in the same manner 
as for hESC-CM delivery. hESC-CM-treated animals also received epicardial 
application of 1-3 tissue-engineering constructs where hESC-CMs were seeded 
ina collagen scaffold. (These tissue engineered constructs did not adhere to the epi- 
cardial surface and were not recovered at the end of the experiment.) Eutha- 
nasia was induced by iv. injection of pentobarbital and phenytoin (Beuthanasia-D) 
followed by supersaturated KC] and Beuthanasia (CIII). Hearts were removed and 
perfused with University of Wisconsin cardioplegia solution before transportation 
on ice for calcium imaging experiments. Seven macaques were subjected to myo- 
cardial infarction. One was euthanized 2 days post-infarction (no treatment) 
because of lower limb ischaemia secondary to arterial thrombosis, and this was 
the only animal excluded from analysis. All others survived to the completion of 
the experiment. Two macaques (one cell-treated and one vehicle-only control) 
were euthanized at day 14, three macaques (two cell-treated and one vehicle-only 
control) were euthanized at day 28, and one cell-treated monkey was euthanized 
84 days (3 months) after hESC-CM delivery. Control and cell-treatment groups 
were allocated in an unblinded and non-randomized manner. 

PCR detection of human ES-cell grafts. A high-throughput method of human 
cell detection was used as previously reported’. Briefly, hearts from mice engrafted 
with hESC-CMs were washed, snap frozen in liquid nitrogen and homogenized 
using a dis-membranator (Braun). Samples were resuspended in 200 il of RNase/ 
DNase-free water supplemented with proteinase K and Chelex beads. Samples 
were centrifuged and a 2 ll sample of the DNA-containing supernatant removed 
for subsequent PCR using Alu-specific primers. Data were compared to standard 
curves generated with known human DNA quantities. 

Imaging of GCaMP3-expresssing grafts. Intravital imaging of hearts with 
GCaMP3-positive grafts was performed on days 14, 28 or 84 after hESC-CM trans- 
plantation using ex vivo preparation. For these experiments, the heart was mounted 
on a gravity-fed Langendorff apparatus and then perfused at 100mm Hg with 
modified Tyrode solution at 37°C. The epicardial GCaMP3 signal was then 
recorded before and after supplementation of the perfusate with 2,3-butanedione 
monoxime (BDM; 20 mM)’*”’. GCAMP3 signal was visualized using an epifluor- 
escence stereomicroscope (Nikon, SMZ 1000) equipped with an EXFO X-Cite 
illumination source. GCaMP3 was excited at 450-490 nm and bandpass filtered 
(500-550 nm) before detection by an electron-multiplying, charge-coupled device 
camera (Andor iXon 860 EM-CCD) controlled by Andor Solis software. GCaMP3 
image acquisition was typically at 80-140 frames per second (f-p.s.). Signals from 
the charge-coupled device (CCD) camera and the surface ECG were fed through 
a computer for digital storage and off-line analysis using Andor software and 
Labchart. 

Echocardiography. Images were acquired with an HD11-XE (Phillips) with $7 
2 MHz trans-oesophageal probe. Trans-oesophageal four-chamber, two-chamber 
and short axis views were collected together with deep trans-gastric short-axis views. 
Functional analysis was performed using XCelera (Phillips) software by two inde- 
pendent cardiologists blinded to experimental conditions. 
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Telemetric ECG. ECG recordings were acquired from conscious, freely mobile 
animals using a Dataquest ART telemetry system (DSI). Recordings from 24h 
periods (midnight to midnight) were obtained from macaques with myocardial 
infarction with or without hESC-CM delivery. All ECG traces were evaluated 
manually by a cardiologist using Ponemah software (DSI) who determined the 
total number and frequency of events. Ventricular tachycardia (VT) was defined as 
a run of four or more premature ventricular complexes (PVCs) with ventricular 
rate of more than 180 beats per minute. Accelerated idioventricular rhythm (AIVR) 
was defined as four or more PVCs with a rate of less than 180 beats per minutes. VT 
or AIVR were considered sustained if the duration was greater than 30s. 
Histology and immunohistochemistry. Histological studies were carried out as 
detailed previously by our group’*”° with some adaptation. For immunohisto- 
chemistry, we used the primary antibodies detailed in Extended Data Fig. 10, 
then either fluorescent secondary antibodies (Alexa-conjugated, species-specific 
antibodies from Molecular Probes) or the avidin biotin reaction followed by 
chromogenic detection (ABC kits from VectorLabs). Paraformaldahyde-fixed 
macaque hearts were dissected to remove the atria and right ventricle before cross- 
sections were obtained by sectioning parallel to the shortaxis at ~3 mm thickness 
on a commercial slicer (Berkel). Whole heart, left ventricle and each slice were 
weighed before tissue processing. For morphometry, infarct regions were iden- 
tified by Picrosirius red staining and areas calculated using Nanozoomer scanning 
and software (Hamamatsu). Graft sizes were calculated by anti-GFP staining. All 
immunofluorescent images were collected by a Nikon A1 Confocal System attached 
to a Nikon Ti-E inverted microscope platform and using a water-immersion Nikon 
X60 CFI Plan Apo objective lens with 1.2 NA. Image acquisition was performed at 
room temperature using Nikon NIS Elements 3.1 software to capture 12-bit raw 
files that were then rescaled to 16-bit images for further processing. All images were 
collected as a single scan with the pinhole adjusted to 1 Airy unit at 1,024 x 1,024 
pixel density. For figure preparation, images were exported into Photoshop CS3 
(Adobe). If necessary, brightness and contrast were adjusted for the entire image 
and the image was cropped. Live cell imaging was performed using a Nikon Eclipse 
TS100 inverted microscope with white light source and an X-Cite Series 120Q 
Laser. For calculation of cardiomyocyte diameter, longitudinally sectioned cardio- 
myocytes were chosen for measurement. Transversely or obliquely cut cardiomyo- 
cytes were excluded from morphometric analysis. A point-to-point perpendicular 
measured line at the position of midnucleus level and the diameter measured using 
Image J software (version 1.47). At least 200 cardiomyocytes were measured in 
each animal. 

Microcomputerized tomography scanning and image analysis. Microcom- 
puterized tmorgraphy (CT ) was performed as previously described”*. Microfilled 
hearts were imaged in a Skyscan 1076 «tCT scanner at 35 1m spatial resolution 
using the following settings: 55kV, 180mA, 0.5mm aluminium filter, 220 ms 
exposure, rotation step of 0.5°, 180° scan, and 10X frame averaging. Raw scan 
data were reconstructed to a three-dimensional slice data set with an isotropic 
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resolution of 35 1m using the software NRecon version 1.6.1.0 (Skyscan), and 
analysed using CTan (Skyscan) and Analyze 10.0 (Mayo Clinic) as follows. 
Samples were thresholded to a level where vessels separated into distinct entities 
to allow visualization of individual networks. Non-vascular Microfil (for example, 
in the atria, aorta, coronary sinus, and so on) was digitally removed in Analyze 
using the ‘Image Segmentation’ module. Then, delineation of graft and/or scar 
tissue was drawn in CTan: histological sections of each heart slice (sliced at 2mm 
thickness) stained to highlight the graft and scar (Picrosirius red (scar) and GFP 
(graft); see later) were imported into the microcomputerized tomography (\1CT) 
three-dimensional data set by aligning and replacing two-dimensional CT slices 
at 2mm intervals. The graft and scar regions were manually outlined on these 
histological pictures, and the region of interest (ROI) interpolation function in 
CTan extended the ROI from the manually outlined slices across all slices to 
produce a three-dimensional representation of the graft or scar (volume of interest 
(VOI)). The resulting graft/scar VOI was then used to distinguish vessel location 
(that is, graft, scar or uninjured cardiac tissue) in subsequent analyses. Individual 
vessel segmentation to determine vessel identity and branching pattern was per- 
formed using Analyze with the graft/scar VOI imported from CTan. Arterial/ 
venous identity in three-dimensional renderings was assigned by determining 
the origin of each vascular network (for example, aorta, coronary sinus, and so on). 
Statistical analysis. All values are expressed as mean ~ s.e.m. Statistical analyses 
were performed using Graphpad Prism software, with the threshold for signifi- 
cance level set at P< 0.05. For murine cryopreservation graft analysis study and 
india ink injection experiments, paired t-test analysis of means was used. 
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Extended Data Figure 1 | Cryopreservation does not affect hESC-CM injection of cryopreserved or non-cryopreserved hESC-CMs were not 
engraftment. a, Schematic representation of experimental design for significantly different (P > 0.05, t-test). Mean + s.e.m. is shown (n = 9 
cryopreservation testing experiments. b, Human genomes detected after biological replicates) Experiment was performed once. NS, not significant. 
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Extended Data Figure 2 | Creation and validation of the GCaMP3- b, Southern blot analysis demonstrates a single integration event by 


expressing human ES-cell lines. a, Targeting construct for ZFN engineering of hybridization for neomycin resistance cassette (left) and heterozygous AAVS1 
GCaMP3 into the AA VS1 locus. The endogenous genomic probeandneomycin _ integration by genomic probe labelling (right). 
resistance gene probe binding sites used for Southern blotting are shown. 
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Extended Data Figure 3 | Chromosomal analysis of human ES cells ochrome of chromosome 20 long arm (arrow). b, RUES2-GCaMP3 ES- 
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Extended Data Figure 4 | Flow cytometry for cardiomyocyte differentiation of human ES cells. Representative histogram of hESC-CMs after differentiation 
shows 73% cTnT-expressing cells. 
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D -14: Echocardiography and Endpoint (D14, 28, 84): Echocardiography, necropsy 
Myocardial Infarct Creation and ex vivo GCAMP3 imaging 
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D -5: Immunosuppression commences D 0: Echocardiography and Cell Injection 


D -14 or DO through Endpoint: Telemetric Electrocardiogram Recording 


Extended Data Figure 5 | Schematic representation of experimental design. _ continued until animals were euthanized. Primary endpoints were (1) 


Myocardial infarction was created by advancing a balloon catheter into the histologically based morphometric calculations of infarct and graft size with 
distal left anterior descending artery and inflating it to create ischaemia analysis of graft composition, and (2) ex vivo analysis of graft-host 

(90 min) followed by reperfusion. The infarct was induced 14 days (D) before _electromechanical coupling enabled by GCaMP3 fluorescence detection. 
hESC-CM delivery via left thoracotomy. Immunosuppression using Secondary endpoints were (1) detection of arrhythmias by telemetric 
cyclosporine A, methylprednisolone and abatacept (T-cell co-stimulatory electrocardiogram analysis, and (2) analysis of left ventricular functional 
antagonizing fusion protein) was delivered 5 days before cell delivery and change by trans-oesophageal echocardiography. 
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No Suture: Mattress Suture 


Extended Data Figure 6 | Technique for hESC-CM injection to infarct 
region and border zones using ‘mattress’ suture strategy. a, The macaque 
infarcted ventricular apex is seen as a blanched region (dotted line) during 
left thoracotomy. A total of 15 aliquots, each containing 100 pl of hAESC-CM 
in pro-survival cocktail, were delivered through five epicardial puncture sites 
(arrows, note one further puncture site not seen is on posterior aspect). 

b, hESC-CM retention after injection was increased by use of a ‘mattress’ 
suture. Crosses indicate insertion points of suture with dotted lines 
representing path of suture (exaggerated size for diagrammatic representation). 
A needle tip was inserted into the resulting rectangular area and the suture 
was tightened after a series of three injections (altering the trajectory of 

the needle) but before withdrawal of needle tip. c, Quantification of India ink 
retention after injection into left ventricular myocardium of anaesthetized 
macaques with or without use of the mattress suture technique (n = 3 biological 
replicates each group). A trend favouring greater retention with the mattress 
suture is seen. 
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Extended Data Figure 7 | Remuscularization of the infarcted macaque Red staining of sections in close proximity to confocal immunofluorescence 
heart. a-f, Single channels of confocal immunofluorescence shown in Fig. la-f. _ in af shows lack of fibrosis within hESC-CM grafts. Scale bars: 2,000 um 
Macaque heart shown was subjected to myocardial infarction and (a, f, g, 1); 1,000 um (b-e, h-k). 


transplantation of hESC-CMs 14 days before being euthanized. g-I, Picrosirius 
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Extended Data Figure 8 | Remuscularized infarct region is composed of Percentage of GFP/a-actinin double-positive cells and GFP-positive/c-actinin- 
engrafted cardiomyocytes that increase in size with time. a, Quantification _ negative cells are shown as mean ~ s.d. b, Normal curve from histograms 
of the sarcomeric protein o-actinin expression in GFP-expressing grafts. showing the distribution of human ES-cell-derived cardiomyocyte diameters 
The vast majority (>98%) of GFP-expressing cells co-expressed «-actinin. (graft) in monkey hearts 2 weeks, 1 or 3 months after cell delivery. 
P3-P6 represent individual animals (m = 1) euthanized at 2 weeks (2 wk) c-f, Individual histograms with superimposed normal curve of animals 
1 month (m) or 3 months after hESC-CM delivery. Five hundred to seven P3-P6 (as above). 


hundred cells were counted from three different graft regions of each heart. 
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Extended Data Figure 9 | No evidence of human graft rejection. graft is detected by anti-GFP primary antibody with 3,3’-diaminobenzidine 
a-i, Representative low- (d-f) and high- (a-c and g-i) power magnification (DAB) detection of secondary antibody (brown). Few CD3~ T lymphocytes or 
of hESC-CM graft 28 days after cell delivery to infarcted macaque heart. CD20* B lymphocytes are seen surrounding the hESC-CM grafts. Comparable 
Representative low- (j-k) and high- (1, m) power magnification of infarct numbers of T and B cells are seen in control infarcts receiving no hESC-CM 
region from control macaque 28 days after sham treatment. The hESC-CM treatment. Boxed inset regions show areas of higher magnification. 
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Duration QRS morphology QRS Rate 
(hr:min:sec) Duration (bpm) 
ms 
0:16:22 Monomorphic LBBB 70 300 
0:48:18 Monomorphic LBBB 70 300 
0:35:56 Monomorphic LBBB 70 270 
0:5:57 Monomorphic LBBB 80 300 
0:45:24 Monomorphic LBBB 80 280 
24:00:00 Polymorphic 62-84 220- 
(sustained monomorphic 240 
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Extended Data Figure 10 | Summary of ventricular tachycardia and 
echocardiographic assessment of left ventricular function. a, Table 
characterizing episodes of ventricular tachycardia after engraftment of 
hESC-CMs (detailed in Fig. 5). Note that P5 demonstrated no discernible 
sinus rhythm on telemetric recording of the ECG 14 days after hESC-CM 
delivery. Although QRS morphology varied, the tachyarrhythmia comprised 
sustained periods of stable monomorphic QRS morphology. LBBB, left 
bundle branch block; RBBB, right bundle branch block. bpm, beats per minute. 
b, Left ventricular function was assessed by trans-oesophageal 
echocardiography at the following time points: before myocardial infarct 


2-chamber view 


5 10 
Time after myocardial infarction (wk) 


Antibody type 


Rabbit polyclonal 
Mouse monoclonal 
Mouse monoclonal 
Mouse monoclonal 
Mouse monoclonal 
Mouse monoclonal 
Mouse monoclonal 
Rat monoclonal 
Mouse monoclonal 


creation, before hESC-CM delivery (2 weeks after myocardial infarction) and 


~B- P3 

~a- P4 

-¥- PS 

~e- P6 

-O- P7 

15 
Company Cat# or Dilution 
Clone 

Novus 6003008 1:1000 
Sigma-Aldrich A7811 1:250 
DSHB CcT3 1:1000 
DSHB A4.951 1°25 
Sigma-Aldrich C3678 1:100 
Novus 1001770 1:200 
Abcam ab9498 1:20 
Serotec MCA1477 1:2000 
Dako M0755 1:2000 


before animals were euthanized (2, 4 or 12 weeks after myocardial infarction). 
P7 received no cells/vehicle only. All other animals received hESC-CMs. Results 
shown are for left ventricular ejection fraction calculated by two blinded 
cardiologists from the two-chamber view of the left ventricle. Note that this 
view best captures the infarcted antero-apical wall. The vehicle-treated control 
monkey showed a modest diminution in ejection fraction post-infarction. 
The cell-treated animals showed variable responses, with some having 
increased function and some having decreased function. Because of small group 
size, no statistical effects of hESC-CM therapy can be discerned. ¢, Table of 
antibodies used. DSHB, Developmental Studies Hybridoma Bank. 
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Therapeutic targeting of BET bromodomain proteins 
in castration-resistant prostate cancer 


Irfan A. Asangani'*, Vijaya L. Dommeti', Xiaoju Wang"’, Rohit Malik!*, Marcin Cieslik', Rendong Yang’, June Escara-Wilke’, 
Kari Wilder-Romans*, Sudheer Dhanireddy', Carl Engelke’, Mathew K. Iyer’, Xiaojun Jing', Yi-Mi Wu', Xuhong Cao", 
Zhaohui S. Qin’, Shaomeng Wang®”, Felix Y. Feng’*’ & Arul M. Chinnaiyan>?>7"* 


Men who develop metastatic castration-resistant prostate cancer (CRPC) 
invariably succumb to the disease. Progression to CRPC after andro- 
gen ablation therapy is predominantly driven by deregulated androgen 
receptor (AR) signalling’*. Despite the success of recently approved 
therapies targeting AR signalling, such as abiraterone** and second- 
generation anti-androgens including MDV3100 (also known as 
enzalutamide)”*, durable responses are limited, presumably owing to 
acquired resistance. Recently, JQ1 and I-BET762 two selective small- 
molecule inhibitors that target the amino-terminal bromodomains 
of BRD4, have been shown to exhibit anti-proliferative effects in a 
range of malignancies”. Here we show that AR-signalling-competent 
human CRPC cell lines are preferentially sensitive to bromodomain 
and extraterminal (BET) inhibition. BRD4 physically interacts with 
the N-terminal domain of AR and can be disrupted by JQ] (refs 11, 13). 
Like the direct AR antagonist MDV3100, JQ1 disrupted AR recruit- 
ment to target gene loci. By contrast with MDV3100, JQ1 functions 
downstream of AR, and more potently abrogated BRD4 localization to 
AR target loci and AR-mediated gene transcription, including induc- 
tion of the TMPRSS2-ERG gene fusion and its oncogenic activity. 
In vivo, BET bromodomain inhibition was more efficacious than direct 
AR antagonism in CRPC xenograft mouse models. Taken together, 
these studies provide a novel epigenetic approach for the concerted 
blockade of oncogenic drivers in advanced prostate cancer. 

The identification and therapeutic targeting of co-activators or media- 
tors of AR transcriptional signalling should be considered as alternative 
strategies to treat CRPC™*. BRD4 is a conserved member of the BET family 
of chromatin readers, which includes BRD2, BRD3 and BRDT. BRD4 
has a critical role in transcription by RNA polymerase II (RNA Pol II) by 
facilitating recruitment of the positive transcription elongation factor 
P-TEFb’*”’. Similar to other BET-family proteins, BRD4 contains two 
conserved bromodomains, BD1 and BD2. Competitive binding of JQ1 
or I-BET762 to the bromodomain pocket results in the displacement of 
BRD4 from active chromatin and the subsequent removal of RNA Pol II 
from target genes’®*'’. Although most cancer cells express BET-family 
proteins, it is not clear why only a subset of cell lines from diverse cancers 
responds to BET inhibitors”"*. Recently, BRD4 was shown to interact with 
sequence-specific DNA-binding transcription factors in a gene-specific 
manner’. As the genetic and epigenetic landscape differs between tumour 
types, it is possible that distinct transcriptional regulators that associate 
with BRD4 might influence the action of BET inhibitors. 

To discover new treatment options for CRPC, we treated a panel of five 
prostate cancer cell lines and one benign prostate cell line with JQ1, and 
found three of the AR-signalling positive cells to be sensitive to JQ1, 
although all six cell lines express high levels of its target proteins (Fig. la 
and Extended Data Fig. la, b). Next, knockdown of BRD2, 3 and 4 
(Extended Data Fig. 1c) led to significant inhibition of cell proliferation 
and invasion, phenocopying JQ1 treatment (Extended Data Fig. 1d, e). 


Furthermore, JQ1 treatment induced Go-G, arrest, apoptosis and assoc- 
iated transcriptional downregulation of the anti-apoptotic protein BCL-xl 
(also known as BCL2L1) in AR-positive cells’*"* (Fig. 1b and Extended 
Data Fig. 1f-h). Similar to BCL2 downregulation by the BET inhibitor 
I-BET151 in leukaemia”, a reduction in BCL-xl by JQ1 could be explained 
in part by the observed loss of BRD2/3/4 recruitment to its promoter 
region (Extended Data Fig. 1j). Even at a relatively low 100 nanomolar 
(nM) concentration, long-term colony formation of AR-positive cells was 
severely inhibited by JQ1 (Extended Data Fig. 1k) with no apparent effect 
on JQ] target proteins (Extended Data Fig. 11, m). 

As AR-positive cells were preferentially sensitive to JQ1, we examined 
whether JQ1 has an effect on AR target genes. VCaP human prostate 
cancer cells that harbour the TMPRSS2-ERG gene fusion and AR amp- 
lification” showed a dose-dependent decrease in prostate-specific antigen 
(PSA) and ERG at the messenger RNA and protein levels upon JQ1 
treatment (Fig. 1d, e). Similar effects were observed in LNCaP and 22RV1 
prostate cancer cells (Extended Data Fig. 2a, b). Furthermore, bortezomib 
did not reverse the JQ1-mediated PSA and ERG protein loss, indicating 
that these genes are regulated at the transcriptional level (Extended Data 
Fig. 2c). 

We performed microarray analysis to examine changes in global gene 
expression upon JQ1 treatment. Gene set enrichment analysis (GSEA) 
using the AR gene signature revealed significant repression of these genes 
in AR-positive cells (Fig. 1f), suggesting a role of BET proteins in AR- 
mediated transcription. Additionally, we observed a loss of the MYC- 
associated gene signature in AR-positive cell lines upon JQ1-treatment 
(Extended Data Fig. 2d). MYC is a known transcriptional target of BET 
inhibition in haematological cancers'’”*. Interestingly, MYC levels were 
attenuated by JQI in cells that are AR positive and sensitive to JQ1 
inhibition, but not in AR-negative cells (Extended Data Fig. 2e). Thus, 
high expression of MYC per se (Extended Data Fig. 1b) does not confer 
sensitivity to JQ1 in prostate cancer cells. Time-course experiments with 
JQ1 demonstrated loss of MYC (Extended Data Fig. 2f, g) and cyclohex- 
amide had no additional effect on MYC protein levels (Extended Data 
Fig. 2h, i), ruling out a post-translational mode of JQ] action. Phenotypically, 
knockdown of MYC did not affect cell invasion (Extended Data Fig. 2)), 
whereas JQ1 treatment inhibited invasion (Extended Data Fig. le). 
Furthermore, exogenous expression of MYC did not result in rescue of 
JQ1-mediated inhibition of cell growth (Extended Data Fig. 2k, 1). Thus, 
although MYC levels may be repressed by JQ1 in AR-positive cells, and 
may have a role in proliferation, MYC does not seem to be the primary 
target for the antineoplastic effects of JQ1. 

As BRD4 is known to engage sequence-specific DNA-binding proteins”, 
we proposed that AR may interact directly with BRD4. We performed gel- 
filtration chromatography and found that AR and BRD4 predominantly 
co-eluted in a high-molecular-weight complex (Fig. 2a and Extended Data 
Fig. 3a). Moreover, RNA Pol II, a reported target for phosphorylation by 


1 Michigan Center for Translational Pathology, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA. Department of Pathology, University of Michigan Medical School, Ann Arbor, 
Michigan 48109, USA. Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia 30329, USA. “Department of Radiation Oncology, University of Michigan Medical School, Ann 
Arbor, Michigan 48109, USA. SHoward Hughes Medical Institute, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA. “Departments of Internal Medicine, Pharmacology, and Medicinal 
Chemistry, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA. 7Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA. ®Department 
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Figure 1 | Prostate cancer cell lines with intact androgen signalling are 
sensitive to BET bromodomain inhibition. a, Half-maximum inhibitory 
concentration (ICso) for JQ] in each cell line is shown. b, Induction of apoptosis 
in VCaP prostate cancer cells by JQ1. Cleaved PARP (cPARP) immunoblot 
analysis. GAPDH served as a loading control. DMSO, dimethylsulphoxide. 

c, Quantitative reverse transcription polymerase chain reaction (qRT-PCR) 


analysis of indicated genes in VCaP cells treated with varying concentrations of 
JQ] for 24h. Data represent mean + standard error of the mean (s.e.m.) (m = 3) 
from one of three independent experiments. d, Immunoblot analyses of AR, 
PSA and ERG levels in VCaP cells treated with JQ1. e, GSEA of the AR target 
gene signature in VCaP, LNCaP, 22RV1 and DU145 cells. NS, not significant. 
*P = 0.05, **P = 0.005 by two-tailed Student’s t-test. 
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Figure 2 | Physical association of the N-terminal domain of AR with BRD4 and 
its disruption by BET bromodomain inhibition. a, VCaP nuclear extracts were 
fractionated on a Superose-6 column and AR, BRD4 and RNA Pol II were 
analysed by immunoblotting. b, Endogenous association of AR and BRD2/3/4. 
VCaP and LNCaP nuclear extracts were subjected to immunoprecipitation 
using an anti-AR antibody. Immunoprecipitates (IPs) were analysed for 

the presence of BRD2/3/4 by immunoblotting (IB; top). The immunoblot 
was stripped and reprobed for AR (bottom). 5% total lysate was used 

as input control. c, Schematic of BRD4 and AR constructs used for 
co-immunoprecipitation experiments. CTD, C-terminal domain; ET, 
extraterminal domain; DBD, DNA-binding domain; LBD, ligand-binding 
domain. d, NTD of BRD4 interacts with AR. Proteins from 293T cells 
co-transfected with various His-BRD4 deletions and Halo—AR constructs 
were subjected to immunoprecipitation with Halo beads followed by 
immunoblotting with His-tag antibody. Inputs are shown in the bottom panel. 


e, As in d but with the indicated salt concentrations. f, Representative 
sensorgrams from three independent experiments for AR-BRD4 (BD1-BD2) 
by an OctetRED biolayer interferometry showing direct interaction. Real-time 
binding was measured by immobilizing biotinylated AR protein on the super 
streptavidin biosensor and subsequent interaction with varying concentrations 
of BRD4 (BD1-BD2) protein. The plots show the response versus protein 
concentration curves derived from the raw binding data. Right, Kj represents 
the BRD4 (BD1-BD2) concentration yielding half-maximal binding to AR. 
Protein RNF2 was used as negative control. g, NTD domain of AR interacts with 
BD1 of BRD4. Equal amounts of in vitro translated proteins were combined and 
immunoprecipitated using Halo beads followed by immunoblot analysis 

with anti-GST antibody. h, JQ1 disrupts AR-BD1 interactions. Varying 
concentrations of JQ1 were incubated with AR-BD1, NTD1b-BD1 and 
AR-BD2 complexes before immunoprecipitation followed by immunoblot 
analysis. 
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BRD4 (ref. 21), also co-eluted in the same complex, suggestive of a 
large multi-protein complex composed of AR, BRD4 and RNA Pol II. 
Immunoprecipitation experiments further confirmed an endogenous 
association between AR and BRD4 (Fig. 2b). Additionally, we observed 
an interaction between AR and BRD2/3 (Fig. 2b), implicating a common 
region in BRD2, 3 and 4 proteins responsible for AR interaction. To map 
the region mediating this interaction, we tested the ability of different 
deletion variants of BRD4 to pull-down AR in 293T cells (Fig. 2c). A 
BRD4 variant containing the BD1 and BD2 domains maintained the 
ability to pull-down AR even at high salt concentrations (Fig. 2d, e). To 
determine whether the BD1-BD2 domains directly interact with AR, 
we carried out quantitative assessment of the binding affinity using the 
OctetRED system. We applied varying concentrations of BD1-BD2 pro- 
tein to biosensors with immobilized AR and found that BRD4 interacts 
with AR in a concentration-dependent fashion, with an estimated disso- 
ciation constant (Kq) of 70 nM, supporting a high-affinity interaction 
(Fig. 2f and Extended Data Fig. 3b, c). To fine map this interaction we 
created a series of Halo-tagged AR and glutathione S-transferase (GST)- 
tagged BRD4 constructs for in vitro pull-down studies and demonstrated 
that the BD1, and to a lesser extent the BD2 domain, bind directly to the 
N-terminal domain (NTD) domain of AR, which was further mapped to 
a 38-amino-acid region—NTD1b—of AR (Fig. 2g and Extended Data 
Fig. 3d-f). Subsequently, we observed the disruption of BD1-AR and 
BD1-NTD 1b interactions by JQ] (Fig. 2h), as well as loss of endogenous 
BRD4-AR interactions (Extended Data Fig. 3g). Together, these data 
indicate that BET protein inhibition leads to disruption of the AR-BRD4 
interaction, which probably explains the preferential activity of JQ] in 
AR-positive prostate cancer cells. 

The ubiquitously expressed BRD2, 3 and 4 proteins are suggested to 
have overlapping functions’®*” and, consistent with this notion, we 
observed AR interactions with all three. Because BET inhibitors, such 
as JQ1 and I-BET762, have high affinity for the BD1 and BD2 domains 
of BRD2/3/4 proteins’”’"*, we proposed that BET inhibitors may affect 
genome-wide recruitment of all three BET proteins. We performed 
chromatin immunoprecipitation followed by sequencing (ChIP-seq) 
with antibodies against BRD2/3/4 in VCaP cells treated with JQ1 or 
I-BET762 (Extended Data Fig. 4a) and observed a high genome-wide 
overlap between BRD2, 3 and 4 (62-86% peak overlap) (Extended Data 
Fig. 4b, c). JQ1 or I-BET762 treatment led to a reduction in the recruitment 
of all three proteins to chromatin (Extended Data Fig. 4d). Moreover, 
the reduced BRD2/3/4 recruitment was equally distributed for regions 
with or without AR (Extended Data Fig. 4e). 

Binding of androgen (dihydrotestosterone (DHT)) to AR leads to its 
translocation from the cytoplasm to the nucleus, where it binds to regions 
of DNA harbouring androgen-responsive elements (AREs) and results 
in subsequent recruitment of proteins involved in transcriptional acti- 
vation or suppression in a gene-specific manner. BRD4 interacts with 
acetylated histones as well as DNA-binding transcription factors, lead- 
ing to context-dependent transcriptional activation or inhibition of tar- 
get genes'*!””?. As the AR-BRD4 interaction is disrupted by JQ] (Fig. 2), 
we next explored whether AR localization is affected in a genome-wide 
context. We performed ChIP-seq with antibodies against AR, BRD4 and 
RNA Pol II in cells that were either starved, treated with DHT or DHT 
plus JQ1 (Extended Data Fig. 4a). Two anti-androgens, bicalutamide 
and MDV3100, were included for comparison. As expected, the aver- 
age ChIP-seq signal for AR was highly enriched in DHT-treated cells 
(Fig. 3a and Extended Data Fig. 5a, b). Recruitment of AR to target loci 
was markedly attenuated by MDV3100 and less so by bicalutamide. Inter- 
estingly, JQ1 blocked AR recruitment almost as effectively as MDV3100 
(Fig. 3a and Extended Data Fig. 5c-e). Furthermore, we observed a co- 
recruitment of AR and BRD4 at 2,031 sites. The strongest association 
was observed within promoters of AR-regulated genes (502 promoters, 
P=4xX 10 *), and for the highest AR peaks (1,112 sites, P= 1 X 10 °°) 
(Fig. 3b). Limiting our evaluation to AR and BRD4 coincident peaks, we 
observed that DHT-mediated AR recruitment to these loci was inhib- 
ited by MDV3100 and to a lesser extent by JQ1 (Fig. 3c). By contrast, 
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Figure 3 | BET bromodomain inhibition disrupts AR and BRD4 binding to 
target loci. a, AR ChIP-seq was performed in VCaP cells treated for 12 h with 
vehicle, DHT (10 nM), DHT plus JQ1 (500 nM), DHT plus MDV3100 (10 uM) 
or DHT plus bicalutamide (25 1M). Summary plot of AR enrichment (average 
coverage) across AR-binding sites (ARBs) in different treatment groups is 
shown. Data represent one of two biological replicates. b, Venn diagram 
illustrating the overlap of AR- and BRD4-enriched peaks in DHT-treated 
sample. c, d, Summary plot for AR and BRD4 enrichment for the AR-BRD4 
overlapping (2,031) regions. e, Genome browser representation of AR, BRD4 
and RNA Pol II binding events on a putative super-enhancer of the AR- 
regulated BMPRIB gene. The y-axis denotes reads per million per base pair. 
The x-axis denotes the genomic position with a scale bar on top right. The 
putative super-enhancer region enriched for AR, BRD4 and RNA Pol II is 
depicted with a black bar on the top left. chr4, chromosome 4. 


JQ1 almost completely abrogated DHT-induced BRD4 recruitment to 
the AR-BRD4 shared loci (Fig. 3d). Examples of gene tracks for AR- 
and BRD4-associated genomic regions such as enhancers and super- 
enhancers” and the effects of different treatments on their levels are 
shown in Fig. 3e and Extended Data Fig. 5f. Corroborating the ChIP- 
seq data, gene expression analysis in VCaP and LNCaP cells showed more 
efficient repression of DHT-induced AR-target genes by JQ] than by 
MDV3100 or bicalutamide (Extended Data Fig. 5g, h). 

JQ1 treatment had a marked effect on ERG expression in VCaP cells 
(Fig. 1d, e and Extended Data Fig. 5h), and we found that the attenuation 
of DHT-induced ERG expression by JQ1 was due to de-recruitment of 
RNA Pol II from the ERG gene body and reduced binding of AR and 
BRD4 on the TMPRSS2 promoter/enhancer (Extended Data Fig. 6a, b). 
The efficient downregulation of ERG by JQ1 has important implications, 
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as the TMPRSS2-ERG gene fusion product is the oncogenic driver in 50% 
of prostate cancers”””*. To investigate the effect of JQ1 on ERG-mediated 
transcription, we performed ERG ChIP-seq in cells treated with JQ1 for 
12 h—a time window in which ERG protein levels are still unaffected by 
JQ] (Extended Data Fig. 6c)—and observed a significant loss in the top 4% 
of ERG enriched peaks (Extended Data Fig. 6d). 

We next determined the functional consequences of JQ1 treatment 
by measuring the expression levels of select ERG target genes (Extended 
Data Fig. 6e, f). As expected, the ERG-activated genes were downregu- 
lated and the ERG- repressed genes were de-repressed by JQ1 (Extended 
Data Fig. 6g, h). To evaluate BET inhibitor repression of ERG-mediated 
oncogenic function in an isogenic setting, we overexpressed ERG in 
RWPE and PC3 cells (Extended Data Fig. 7a, b). Treatment with JQ] or 
I-BET762 led to an attenuation of ERG-mediated invasion (Extended 
Data Fig. 7c) and GSEA demonstrated a significant negative enrich- 
ment for ERG target genes upon BET inhibitor treatment (Extended 
Data Fig. 7d). Furthermore, we found that ERG was highly enriched on 
the known distal enhancer of MYC that was reduced upon JQ treatment 
(Extended Data Fig. 8a, b). Likewise, ETV1 occupies the same distal- 
enhancer region in ETV1 fusion-positive LNCaP cells”. Knockdown 
of ERG or ETV1 along with AR led to MYC downregulation, implicat- 
ing MYC regulation by ETS proteins in fusion-positive prostate cancer 
cells (Extended Data Fig. 8c-e). Notably, ChIP-seq analysis of AR and 
RNA Pol II enrichment at the MYC locus presented an interesting 
pattern in which DHT treatment led to increased AR and reduced RNA 
Pol II binding on the MYC distal enhancer and gene body, respectively, 
which was reinstated in the presence of MDV3100 or bicalutamide but 
not JQ1 (Extended Data Fig. 8f). This observation is consistent with the 
concomitant reduction in MYC expression upon DHT treatment that 
was de-repressed in the presence of MDV3100 but not JQ1 (Extended 
Data Fig. 8g-i). Lack of de-repression of MYC by JQ] in this setting 
could be explained by the fact that both AR and ERG are absent from 
the MYC distal enhancer, leading to net loss of MYC expression. These 
data also suggest a mechanism by which CRPC patients become resistant 
to anti-androgen therapy by maintaining expression of the MYC oncogene. 
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Next, we sought to compare the efficacy of JQ1 and MDV3100, a direct 
AR antagonist used clinically to treat advanced CRPC*. Before embarking 
on the in vivo experiment we tested them on VCaP cells in vitro for 8 days 
and observed marginal cell death by MDV3100 versus suppression of cell 
growth at sub-micromolar concentrations of JQ1 (Extended Data Fig. 9a). 
To rule out the possibility of JQ1 being a generic anti-androgen we con- 
firmed that JQ1 had no effect on physiological androgen-regulated pro- 
cesses; however, JQ1 reduced testes size in mice, as reported previously“ 
(Extended Data Fig. 9b-f). Treatment of VCaP tumour-bearing mice with 
JQ1 led to a significant reduction in tumour volume/weight (Fig. 4a, b and 
Extended Data Fig. 10a), whereas MDV3100 had a less pronounced effect. 

Recently, several studies described the pro-metastatic effects of MDV3100 
in pre-clinical models”. To test whether MDV3100 treatment leads to 
spontaneous metastasis in our VCaP xenograft model, we isolated femur, 
liver and spleen from MDV3100-treated mice and found evidence of meta- 
stases in femur and liver (Extended Data Fig. 10b, c). By contrast, JQ1- 
treated mice showed no evidence of metastasis (Extended Data Fig. 10c). 
Taken together, these pre-clinical studies suggest that the use of MDV3100 
in clinically localized prostate cancer may potentiate the formation of 
micro-metastases, unlike BET inhibitors. Consistent with previous reports, 
JQ1 and MDV3100 were both well tolerated by mice (Extended Data 
Fig. 10d). Although VCaP cells were originally derived from a patient with 
CRPC, VCaP tumour xenografts respond to castration in mouse models. 
We found that JQ] still had a growth inhibitory effect in castration- 
resistant VCaP tumour xenografts and observed a 50% reduction in 
castration-resistant tumours by JQ1-treatment (Fig. 4c and Extended 
Data Fig. 10e). 

Maintenance of AR signalling is the most common resistance mech- 
anism that patients with advanced prostate cancer develop after conven- 
tional hormonal treatments”’. AR amplification, mutation and alternative 
splicing have all been suggested as potential resistance mechanisms to 
anti-androgen treatments*””*. Over half of CRPC patients have at least 
one of these aberrations in the AR pathway”. As BET inhibitors function 
‘downstream’ of AR (Fig. 4d), our data indicate that these compounds 
may be effective in the context of AR-mediated resistance, including 
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compensatory mechanisms involving related steroid hormone recep- 
tors that are also likely to require BET bromodomain function. By func- 
tioning downstream of AR, BET inhibition is less likely to be affected 
by acquired resistance associated with AR antagonists, including the 
recently identified F876L mutation of AR”. Although both MDV3100 
and JQ1 block AR recruitment to target loci on a genome-wide scale 
(the “AR cistrome’), we found that JQ1 probably has an enhanced inhib- 
itory effect by further abrogating co-recruitment of BRD4, which is 
required for mobilization of the transcriptional machinery’**’*. 

A recent study demonstrated that BET inhibition leads to preferential 
loss of BRD4 at super-enhancers and consequent transcriptional elonga- 
tion defects’”. These super-enhancers were often associated with key onco- 
genic drivers in a variety of cancers. Tumour cells are thought to become 
addicted to select oncogenes and hence unusually reliant on their high 
expression, which may explain the preferential sensitivity of BET inhibi- 
tion in cancer versus normal tissues. Although MYC and its association 
with multiple myeloma was highlighted as a super-enhancer-dependent 
cancer’’, this framework probably applies to key transcription factors 
involved in the development of CRPC, including AR, ETS and MYC 
(Fig. 4d). Taken together, these data indicate that clinical evaluation of 
BET inhibitors is warranted in CRPC, either as monotherapy or in com- 
bination with second-generation anti-androgens. 


METHODS SUMMARY 


Gene expression profiling was performed using the Agilent Whole Human Genome 
Oligo Microarray following the manufacturer’s protocol. Proteins were extracted by 
lysing the cells in RIPA lysis buffer (Sigma) supplemented with protease inhibitor 
cocktail (Sigma). Immunoblotting was performed with standard protocols using poly- 
vinylidene difluoride (PVDF) membrane (GE Healthcare), and signals were visualized 
with an enhanced chemiluminescence system as described by the manufacturer (GE 
Healthcare). The ChIP assays for BRD2, BRD3, BRD4, AR, RNA Pol II, ERG and 
H3K27ac with specific antibodies were performed using HighCell ChIP kit (Diagenode) 
following the manufacturer’s protocol. All procedures involving mice were approved 
by the University Committee on Use and Care of Animals at the University of Michigan 
and conform to all regulatory standards. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Cell culture. VCaP prostate cancer cells were grown in DMEM with Glutamax 
(Gibco); LNCaP, 22RV1, DU145 and PC3 prostate cancer cell lines were grown in 
RPMI 1640; all were supplemented with 10% FBS (Invitrogen) in 5% CO) cell culture 
incubator. The immortalized benign prostate cell line RWPE-1 was grown in kerati- 
nocyte media with supplements (Lonza). All cell lines were tested and found to be free 
of mycoplasma contamination. 

Cell viability assay. Cells were seeded in 96-well plates at 2,000-10,000 cells per well 
(optimum density for growth) in a total volume of 100 jl media containing 10% FBS. 
Serially diluted compounds in 100 pil media were added to the cells 12h later. After 
96 h incubation, cell viability was assessed by Cell-Titer GLO (Promega). The values 
were normalized and IC; was calculated using GraphPad Prism software. For long- 
term colony formation assay, 10,000-50,000 cells per well were seeded in 6-well plates 
and treated with either 100 nM or 500 nM of JQ1 or DMSO. After 12 days, cells were 
fixed with methanol, stained with crystal violet and photographed. For colorimetric 
assays, the stained wells were treated with 500 il 10% acetic acid and the absorbance 
was measured at 560 nm using a spectrophotometer. 

Cell cycle analysis. Cells were grown in 6-well plates and treated with varying 
concentrations of JQ1. For cell cycle analysis, cells were washed 48 h post-treatment 
with PBS and fixed in 70% ethanol overnight. The cells were washed again with PBS, 
stained with propidium iodide and analysed by flow cytometry. 

RNA interference. For knockdown experiments, cells were seeded in 6-well plates and 
transfected with 100nM ON-TARGETplus SMARTpool siRNA (ThemoScientific) 
targeting BRD2, BRD3, BRD4, MYC or non-targeting control (Non-targeting 
Pool, catalogue no. D-001810-10-50, using oligofectamine (Invitrogen) according to 
the manufacturer’s instructions). The following are the catalogue numbers and the 
siRNA sequences: ON-TARGETplus Human BRD2_SMARTpool, catalogue no. L- 
004935-00-0005, target sequences, CACGAAAGCUACAGGAUGU, GGGCCGAGU 
UGUGCAUAUA, CCUAAGAAGUCCAAGAAAG, GUCCUUUCCUGCCUACG 
UA; ON-TARGETplus Human BRD3_SMARTpool, catalogue no. L-004936-00-005, 
target sequences, AAUUGAACCUGCCGGAUUA, CGGCUGAUGUUCUCGAAUU, 
GGAGAGAUAUGUCAAGUCU, GCGAAUGUAUGCAGGACUU; ON-TARGET 
plus_Human BRD4_ SMARTpool, catalogue no. L-004937-00-0005, target sequences, 
AAACCGAGAUCAUGAUAGU, CUACACGACUACUGUGACA, AAACACAAC 
UCAAGCAUCG, CAGCGAAGACUCCGAAACA; and ON-TARGETplus_Human 
MYC_ SMARTpool, catalogue no. L-003282-00-0005. Cells were trypsinized 24h 
post-transfection and used in cell proliferation and Matrigel invasion assays as well 
as for RNA extractions to determine the knockdown efficiency. 

For AR knockdown ON-TARGETplus Human AR-_SMART pool, catalogue no. 
L-003400-00-0005, target sequences, GAGCGUGGACUUUCCGGAA, UCAAGGA 
ACUCGAUCGUAU, CGAGAGAGCUGCAUCAGUU, CAGAAAUGAUUGCA 
CUAUU was used at 100nM concentration; for ERG knockdown siRNA from 
Dharmacon, catalogue no. D-003886-01-0050 was used; and for ETV1 knockdown 
a mix of ETV1 siRNA ID s4854, catalogue no. 4392420 and ETV1 siRNA ID s4855, 
catalogue no. 4392420 from Life Technologies were used at 100 nM concentration for 
transfection using oligofectamine. 

Cell proliferation assay. For cell proliferation assays after siRNA knockdown, 20,000 
cells per well were seeded in 24-well plates (n = 3) and cells were harvested and 
counted at the indicated time points by Coulter counter (Beckman Coulter). 

VCaP, LNcaP and 22RV1 cells were transduced with either Ad-c-MYC (Vector 

Biolabs, catalogue no. 1285) or LacZ control adenoviral particles. Twenty-four hours 
after infection, equal number of cells were seeded in 24-well plates and treated with 
vehicle, JQ1 or I-BET762 at 500 nM concentration. Cells were counted at the indicated 
time points by Coulter Counter. 
Matrigel invasion assays. Twenty-four hours after infection with siRNA or 500 nM 
JQ1 treatment, 0.2 X 10° VCaP or 0.1 X 10° LNCaP cells were seeded in a transwell 
chamber pre-coated with Matrigel (BD Biosciences). Medium containing 10% FBS in 
the lower chamber served as chemoattractant. In the case of JQ1, 500 nM compound 
was added to both upper and lower chambers. After 48 h, the non-invading cells and 
extracellular matrix were gently removed with a cotton swab and invasive cells located 
on the lower side of the chamber were stained with crystal violet, air dried, photo- 
graphed and counted. 

PC3 and RWPE cells were treated with JQ1 or I-BET762 at 500 nM concentration 
along with DMSO control for 24h before seeding 50,000 cells per well in a transwell 
chamber pre-coated with Matrigel along with the corresponding drugs used for treat- 
ment. Medium containing 10% FBS in the lower chamber served as chemoattractant. 
After 48 h, the non-invading cells and extracellular matrix were gently removed with a 
cotton swab and invasive cells located on the lower side of the chamber were stained 
with crystal violet, air dried and photographed. For colorimetric assays, the inserts 
were treated with 150 pl of 10% acetic acid and the absorbance measured at 560 nm 
using a spectrophotometer (GE Healthcare). 

RNA isolation and quantitative real-time PCR. Total RNA was isolated from cells 
using RNeasy Mini Kit (Qiagen) and cDNA was synthesized from 1,000 ng total RNA 
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using SuperScript III First-Strand Synthesis SuperMix (Invitrogen). qPCRs were per- 
formed in duplicate or triplicate using Taqman assays (Applied Biosystems) or stand- 
ard SYBR green reagents and protocols on a StepOnePlus Real-Time PCR system 
(Applied Biosystems). The target mRNA expression was quantified using the AAC, 
method and normalized to GAPDH expression. All primers were designed using 
primer 3 (http://frodo.wi.mit.edu/primer3/) and synthesized by Integrated DNA 
Technologies. The primer sequences for the SYBR green and catalogue numbers for 
TaqMan assays qPCR used are as follows: BRD2 qPCR fwd, CTACGTAAGAAA 
CCCCGGAAG; BRD2 qPCR rev, GCTTTTTCTCCAAAGCCAGTT; BRD3 qPCR 
fwd, CCTCAGGGAGATGCTATCCA; BRD3 qPCR rev, ATGTCGTGGTAGTCG 
TGCAG; BRD4 qPCR fwd, AGCAGCAACAGCAATGTGAG; BRD4 qPCR rev, G 
CTTGCACTTGTCCTCTTCC; ERG qPCR fwd, CGCAGAGTTATCGTGCCAGC 
AGAT; ERG qPCR rev, CCATATTCTTTCACCGCCCACTCC; PSA(KLK3) qPCR 
fwd, ACGCTGGACAGGGGGCAAAAG; PSA(KLK3) qPCR rev, GGGCAGGGCA 
CATGGTTCACT; TMPRSS2 qPCR fwd, CAGGAGTGTACGGGAATGTGATG 
GT; TMPRSS2 qPCR rev, GATTAGCCGTCTGCCCTCATTTGT; FKBP5 qPCR 
fwd, TCTCATGTCTCCCCAGTTCC; FKBP5 qPCR rev, TTCTGGCTTTCACGT 
CTGTG; SLC45A3 qPCR fwd, TCGTGGGCGAGGGGCTGTA; SLC45A3 qPCR 
rev, CATCCGAACGCCTTCATCATAGTGT; BMPRIB qPCR fwd, CCACCAT 
TGTCCAGAAGACTC; BMPRIB qPCR rev, GCAACCCAGAGTCATCCTCTT; 
MYC qPCR fwd, GCTCGTCTCAGAGAAGCTGG; MYC qPCR rev, GCTCAGATC 
CTGCAGGTACAA; AR qPCR fwd, CAGTGGATGGGCTGAAAAAT; AR qPCR 
rev, GGAGCTTGGTGAGCTGGTAG; ETV1 qPCR fwd, GCAAGAAGGCTTCCT 
GGCTCAT; ETV1 qPCR rev, CCTTCCCGATACATTCCTGGCT; GAPDH qPCR 
fwd, T@CACCACCAACTGCTTAGC; GAPDH qPCR rev, GGCATGGACTGTG 
GTCATGAG; MYC dis.enh ChIPPCR fwd, TGGCAACTTCTGCCTGTGTA; 
MYC dis.enh ChIPPCR rev CAGGCAGGGAGGAAGTCAAT; MYC upstream 
ChIPPCR fwd, CCAGGACAAATGACCACACA; MYC upstream ChIPPCR rev, 
CCCTTGGCAAACATCAACTT; TaqMan primer probes TDRD1, catalogue no. 
Hs00229805 m1; CACNA1D, catalogue no. Hs00167753 m1; ARHGDIB catalogue 
no. Hs00171288 m1; NDRGI, catalogue no. Hs00608387 m1; VCL, catalogue no. 
Hs00419715 ml; KRT8, catalogue no. Hs01595539 g1; MALATI, catalogue no. 
Hs00273907 s1; BCL-xl qPCR, catalogue no. Hs00236329 m1; WNT2 qPCR, cata- 
logue no. Hs00608224 m1; CRISP3 qPCR, catalogue no. Hs00195988 ml. 
Antibodies and immunoblot analyses. Antibodies used in the immunoprecipita- 
tion (IP) and immunoblotting (IB) assays are AR IP, IB (Abcam catalogue no. 
ab74272); RNA Pol II IB (Abcam catalogue no. ab5408); BRD2 IB (Abnova catalogue 
no. PAB3245); BRD3 IB (SantaCruz catalogue no. sc-81202); BRD4 IB (Bethyl cata- 
logue no. A301-985A); ERG IB (Epitomics catalogue no. 2805-1); MYC IB (Sigma 
catalogue no. M5546); PSA IB (Dako catalogue no. A0562); GST IB (GE Life Science 
catalogue no. 27-4577-01); Halo IP, IB (Promega catalogue no. G9281); Poly Histidine 
IP, IB (Sigma catalogue no. H1029); BCL-xl IB (Cell Signaling catalogue no. 2762); 
cPARP IB (Cell Signaling catalogue no. 9541); GAPDH (14C10) IB (Cell Signaling 
catalogue no. 3683 s). All antibodies were used at dilutions suggested by the manu- 
facturers. For western blot analysis, 200 jug total protein extract was boiled in sample 
buffer and 10-20 pg aliquots were separated by SDS-PAGE and transferred onto 
polyvinylidene difluoride membrane (GE Healthcare). The membrane was incubated 
for 1h in blocking buffer (Tris-buffered saline, 0.1% Tween (TBS-T), 5% non-fat dry 
milk) followed by incubation overnight at 4°C with the primary antibody. After a wash 
with TBS-T, the blot was incubated with horseradish peroxidase (HRP)-conjugated 
secondary antibody and signals were visualized by enhanced chemiluminescence system 
as per the manufacturer’s protocol (GE Healthcare). 

Immunoprecipitations. For endogenous immunoprecipitation experiments, nuc- 
lear extracts were obtained from VCaP and LNCaP cells using NE-PER nuclear extrac- 
tion kit (Thermo Scientific). Nuclear pellet was then lysed in IP buffer (20 mM Tris 
pH7.5, 150 mM NaCl, 1% Triton-X 100, protease inhibitor) by sonication. Nuclear 
lysates (0.5- 1.0 mg) were pre-cleaned by incubation with protein G Dynabeads (Life 
Technologies) for 1 h ona rotator at 4 °C. Five micrograms of antibody was added 
to the pre-cleared lysates and incubated on a rotator at 4 °C overnight before the 
addition of protein G Dynabeads for 1 h. Beads were washed three times in IP buffer 
and resuspended in 40 pl of 2X loading buffer and boiled at 90 °C for 10 min for 
separation of the protein and beads. Samples were then analysed by SDS-PAGE 
and western blotting as described earlier. For endogenous competitive assays, the 
VCaP cells were incubated with 5 or 25 uM JQ] for 6h before nuclear protein 
extractions. 

For co-immunonoprecipitation experiments in 293T cells, plasmids encoding vari- 
ous deletion mutants of BRD4 in pCDNA4c (Addgene) and full-length AR in pFN21 
plasmid (Promega) were transfected using Fugene 6.0 HD (Roche) according to the 
manufacturer’s instructions. Twenty-four hours after transfection, total proteins were 
extracted using IP buffer supplemented with protease inhibitor cocktail mix (Sigma) 
and the expressions of the corresponding proteins were analysed by immunoblotting. 
Immunoprecipitation using Halo beads followed by immunoblotting with anti-His 
antibody were performed as described earlier. 
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Cell-free protein-protein interaction studies. In vitro protein expression was 
carried out by cloning the desired expression cassettes downstream of a Halo or GST 
tag to produce fusion proteins. Briefly, AR and its subdomains were cloned into the 
pEN2K vector containing N-terminal GST sequence (catalogue no. G1891, Promega); 
BRD4 and its subdomains were cloned into the pFN19A vector containing N-terminal 
Halo sequence (catalogue no. C8461, Promega). After cloning, the fusion proteins were 
expressed using the cell-free transcription and translation system (catalogue no. L5030, 
Promega) following the manufacturer’s protocol. For each reaction, protein expression 
was confirmed by western blot. 

A total of 10 ll cell-free reaction containing Halo- and GST-tag fusion proteins were 
incubated in PBST (0.1% Tween) at 4 °C overnight. Ten microliters of HaloLink beads 
(catalogue no. G931, Promega) were blocked in BSA at 4 °C overnight. After washes 
with PBS, the beads were mixed with AR-BRD4 mixture and incubated at room 
temperature for 1h. Halolink beads were then washed with PBST four times and 
eluted in SDS loading buffer. Proteins were separated on SDS gel and blotted with 
anti-GST antibody (GE healthcare). For competitive assay, AR-BD1, NTD1b- 
BD1 and AR-BD2 mixture was incubated in the presence of varying doses of JQ1 
compound. 

AR-BRD4 direct interaction assays by OctetRED. The binding affinity between 
AR and BRD4was determined by biolayer interferometry technology using the 
OctetRED system (ForteBio). Recombinant AR protein (catalogue no. AR-8486H, 
Creative Biomart) was biotinylated by EZ-Link NHS-PEG4 Biotinylation Kit (cata- 
logue no. 21329, Thermo Scientific) following the manufacturer’s protocol, and any 
unincorporated biotin was removed from the reactions with Zeba 2 ml desalt columns. 
Biotinylated proteins (5 jg ml~’) were then incubated with super-streptavidin bio- 
sensors (catalogue no. 18-5057, ForteBio) in binding buffer (20 mM HEPES pH 7.4, 
150 mM NaCl) and washed three times in binding buffer. BRD4 (BD1-BD2) protein 
(catalogue no. 31047, BPS Biosciences) was serially diluted in binding buffer, and the 
AR-BRD4 association/dissociation was monitored by OctetRED for 10 min at 25 °C. 
Non-specific binding was controlled by subtracting the signal obtained from AR- 
RNF2 interactions from that of AR-BRD4 interactions and baseline signal drift was 
controlled by monitoring immobilized AR without BRD4. OctetRED analysis software 
was used to analyse the data. 

Gene expression array analysis. VCaP, LNCaP, 22RV1 and DU145 cells were 
treated with 500 nM JQ1 for 24h and total RNA was extracted using RNeasy Mini 
Kit (Qiagen) for gene expression array analysis. For anti-androgen comparative study, 
VCaP and LNCaP cells were grown in media containing 10% charcoal-striped serum 
for 48h followed by pre-treatment with 500 nM JQ1, 10 4M MDV3100 or 25 uM 
bicalutamide for 6h and stimulated with 10nM DHT (androgen) for 18h. Cells 
treated with only vehicle or 10 nM DHT served as controls. For determining the effect 
of BET inhibitors in isogenic ERG system, RWPE-ERG and PC3-ERG cells were 
treated with 500 nM JQ1 or I-BET762 for 24h. Expression profiling was performed 
using the Agilent Whole Human Genome Oligo Microarray according to the man- 
ufacturer’s protocol. All samples were run in technical duplicates or quadruplets 
against control. Over- and underexpressed gene sets were generated by filtering to 
include only data points that showed twofold average over- or underexpression (log 
ratio with P < 0.001) in all hybridizations. 

GSEA was performed using the JAVA program (http://www.broadinstitute.org/ 
gsea) as described previously”. 

The AR target gene signature used in GSEA analysis was generated from common 
upregulated genes in VCaP and LNCaP upon DHT treatment and the gene list is as 
follows: ABCC4, ABHD2, ACSL3, ADARB2, AF349445, AFF4, AI089002, AI207522, 
AI570240, AK023660, AK025360, AK055915, AK057576, AK074291, AK092594, 
AK093002, AK098478, AK124281, AK124426, AL533190, AL713762, ALDHI1A3, 
AMACIL2, ANKRD37, ANXA2, ARSG, ASRGL1, ATP10A, ATPIA1, ATPIA4, 
ATRNLI, AUTS2, AW029229, AW389914, AZGP1, B3GAT1, BC039021, BC041926, 
BC041955, BC055421, BC062780, BG462058, BG618474, BI710972, BM469851, 
BMPRIB, BQO17638, BQ706262, BRP44, BU567141, BU753102, BX099483, 
Cl0orf114, Cl4orf162, C16orf30, C18orf1, Clorf108, Clorf113, Clorf26, C20orf112, 
C6orf81, CA314451, CA414006, CBLL1, CCDC4, CDC14B, CDC14C, CDYL2, 
CEBPD, CENPN, ChGn, CHIA, CHKA, CHST2, CLDN12, CLDN14, CLDN8, 
CTBP1, CUTL2, CXorf9, CYP1A1, CYP2U1, DDR2, DHCR24, DKFZp761P0423, 
DNAJB9, DOCK11, DOCK8, EAF2, EDG7, ELL2, ELOVL5, ELOVL7, EMP1, 
ENDOD1, ENST00000358356, ERN1, ERRFI1, F2RL1, FAM13A10S, FERIL3, 
FGD4, FKBP5, FLJ31568, FLJ39502, FRK, FZD5, GADD45G, GIPR, GREB1, GSR, 
HERC3, HLA-DRB3, HOMER2, HPGD, HS3ST4, HSD17B2, IFI6, IGF1, IGFIR, 
IL20RA, IMPAD1, INPP4B, KCNMA1, KLF15, KLK3, KLK4, KLK5, KRT18, 
KRT19, KRT72, LAMA1, LDLR, LIFR, LOC205251, LOC401708, LOC641467, 
LOC646282, LOC730498, LONRF1, LOX, LRCH1, LRIGI, LSS, MAF, MAK, 
MALTI1, MAP1B, MAP7D1, MBOAT2, MFSD2, MICAL1, MLPH, MOGAT2, 
MPZL1, MTMR9, NANOGP1, NAT1, NCAPD3, NDFIP2, NDRG1, NEBL, NEK10, 
NFKBIA, NNMT, NR4A1, NY-REN-7, ODC1, OLAH, ORM1, ORM2, OTUD7B, 
PACS1, PDLIMS5, PECI, PERI, PFKFB2, PGC, PHACTR3, PNPLA8, PPP2CB, 


RAB27A, RAB4A, RASD1, RHOU, RUNX1, S100A5, SCRG1, SGK, SHROOM3, 
SLC16A6, SLC26A2, SLC26A3, SLC2A14, SLC2A3, SLC38A4, SLC41A1, SLC45A3, 
SLITRK6, SMC4, SMOC1, SNAI2, SNTG2, SOCS2, SPDEF, SPDYA, SPINK5L3, 
SPOCKI, SPTB, ST6GALNAC1, STEAP4, STK17B, TACC1, TBRG1, TBX15, TG, 
TGFB2, TIPARP, TLOC1, TMCC3, TMPRSS2, TNFAIP3, TPD52, TRIM36, 
TRIM63, TIN, TUBA3D, WIPI1, WNT7B, WWTRI, X03757, ZBTB1, ZBTB16 and 
ZBTB24. 

The ERG gene signature was generated by extracting twofold upregulated genes 
from RWPE and PC3 cells stably expressing ERG compared with respective LacZ- 
expressing cells. GSEA was performed using this gene set on gene expression data 
obtained from the JQ1- and I-BET762-treated RWPE and PC3 cells. We also ran 
GSEA using a gene set that was not changed upon expression of ERG to exclude the 
possibility that treatment with JQ1 and I-BET762 may change gene expression in a 
non-specific fashion. All of the gene expression array data (total 48) can be found at the 
Gene Expression Omnibus under accession number GSE55064. 

ChIP and ChIP-seq. The ChIP assays for BRD2, BRD3, BRD4, AR, RNA Pol II, ERG 
and H3K27ac were performed using HighCell ChIP kit (Diagenode) according to the 
manufacturer’s protocol. The antibodies used for ChIP assay are AR PG-21 (Millipore 
catalogue no. 06-680); RNA Pol II (Abcam catalogue no. ab5408); BRD2 (Bethyl cata- 
logue no. A302-583A); BRD3 (Bethyl catalogue no. A302-368A); BRD4 (Bethyl 
catalogue no. A301-985A); H3(acetyl K27) (Abcam catalogue no. ab4729) and IgG 
(Diagenode). For BRD2/3/4 ChIP-seq experiments with BET inhibitors, VCaP 
cells were treated with 500 nM JQ] or I-BET762 for 12h. For AR signalling ChIP- 
seq experiments, VCaP cells were grown in charcoal-stripped serum containing 
media for 48h followed by 6h pre-treatment with vehicle or 500nM JQ1 or 
10 uM MDV3100 or 25 uM bicalutamide and then stimulated with 10nM DHT 
for 12h. For ERG ChIP-seq studies, VCaP cells were treated with 500 nM JQ1 or 
vehicle for 12h. Next, cells were crosslinked for 10min with 1% formaldehyde. 
Crosslinking was terminated by the addition of 1/10 volume 1.25 M glycine for 
5 min at room temperature followed by cell lysis and sonication (Bioruptor, Dia- 
genode), resulting in an average chromatin fragment size of 200 bp. Chromatin 
equivalent to 5 X 10° cells were used for ChIP using various antibodies. ChIP DNA 
was isolated (IPure Kit, Diagenode) from samples by incubation with the antibody at 
4 °C overnight followed by washing and reversal of crosslinking. The ChIP-seq sample 
preparation for sequencing was performed according to the manufacturer’s instruc- 
tions (Illumina). ChIP-enriched DNA samples (1-10 ng) were converted to blunt- 
ended fragments using T4 DNA polymerase, E. coli DNA polymerase I large fragment 
(Klenow polymerase) and T4 polynuleotide kinase (New England BioLabs (NEB)). A 
single A base was added to fragment ends by Klenow fragment (3’ to 5’ exo minus; 
NEB) followed by ligation of Illumina adaptors (Quick ligase, NEB). The adaptor- 
modified DNA fragments were enriched by PCR using the Illumina Barcode primers 
and Phusion DNA polymerase (NEB). PCR products were size selected using 3% 
NuSieve agarose gels (Lonza) followed by gel extraction using QIAEX II reagents 
(Qiagen). Libraries were quantified with the Bioanalyzer 2100 (Agilent) and 
sequenced on the Illumina HiSeq 2000 Sequencer (100-nucleotide read length). 
ChIP-seq analysis 

ChIP-seq enrichment levels. ChIP enrichment levels within a peak (or site) were 
calculated from the sequencing data as follows. First, reads were aligned to the HG19 
reference genome using Bowtie2” with all default settings. Second, aligned reads were 
sorted using NovoSort and exact duplicates were removed using Samtools”’. Third, for 
each peak (site) overlapping reads were counted and this count was divided by the 
length of the peak or site. Fourth, to correct for differences in sequencing depth and 
alignment coverage the values are further normalized by the number of aligned reads 
per million. 

ChIP-seq reproducibility plots. To assess the biological variability of AR and ERG 
ChIP-seq experiments, we compared enrichment levels of their respective replicates. 
For each replicate we called peaks using MACS with all default setting against an IgG 
control. We excluded peaks within genomic regions prone to technical artefacts**. For 
each replicate pair we defined a set of concordant peaks as those overlapping in both 
replicates. For each concordant peak we calculated enrichment levels within the union 
of the two overlapping peaks. The scatter plots include all peaks with enrichment levels 
up to the 99th percentile. 

Overlaps of bromodomain proteins. We compared the genome-wide distribution 
of BRD2, BRD3 and BRD4 peaks in DMSO-treated VcaP cells. First, we called peaks 
for each of the proteins using MACS with all default settings and IgG control. Because 
we were interested in peaks that are possibly biologically significant we used a mod- 
erately stringent significance cut-off (MACS score >100). Next, we identified all 
genomic regions that were enriched for at least one of the proteins. Specifically, we 
‘reduced’ all stringent peaks using GenomicRanges”. For each of those regions we 
established which of the bromodomain proteins were enriched to count the number of 
overlaps. 

Drug-induced changes of bromodomain protein enrichment levels. For each 
protein (BRD2, BRD3, BRD4) we assessed quantitative changes in their respective 
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enrichment levels upon drug treatment (I-BET762, JQ1) relative to the levels in the 
DMSO control. First, peaks were called for all conditions and proteins as described 
earlier. Next, for each protein separately, we identified genomic regions that were 
enriched in any (union) of the treatment conditions (DMSO, I-BET762 or JQ1). 
Within those regions we quantified enrichment levels as described earlier for deter- 
mining ChIP-seq enrichment levels. As enrichment levels of different proteins are not 
directly comparable, we normalize all enrichments to the median level of the DMSO 
control. 

Differential AR-BRD4 enrichment and AR-BRD4 overlap. HPeak, a hidden 
Markov model (HMM)-based peak-calling software* designed for the identification 
of protein-interactive genomic regions, was used for ChIP-seq peak determination. 
For enrichment plots shown in Fig. 3a, c and d, identified peaks for each sample are 
centred by peak summit and average coverage per million was counted within 1,500 bp 
relative to the peak centre. The overlap of AR- and BRD4-enriched regions were 
calculated by BEDtools”. The significance of overlap between AR and BRD4 binding 
was calculated using a hypergeometric test based on the derived number of associated 
genes. The heatmap for AR peak enrichment was generated using python-based script 
on raw data and visualized using JavaTreeView™. 

Differential ERG enrichment. We identified sites with significant differences in 
ERG levels between DMSO- and JQ1-treated cells. First we focused on concordant 
peaks (see ChIP-seq reproducibility plots) that were overlapping or in the +5kb 
proximity of annotated gene loci. We defined a gene locus as the union of all of its 
known transcripts (Ensembl Genes 73). We used DESeq2 to assess the statistical 
significance of differences in ERG enrichment levels. Although DESeq2 was originally 
developed for RNA-seq, its statistical model is well-suited to count data in general. We 
used the tools’ default multiple hypothesis correction method and report peaks with 
significant differences in ERG levels (adjusted P value < 0.1). To assess quantitative 
differences in ERG levels at significantly ‘gained’ (positive difference in ERG levels 
upon JQ] treatment) and ‘lost’ (negative difference in ERG levels upon JQ] treatment) 
we followed the same procedure as described earlier for determining ChIP-seq enrich- 
ment levels. 

Murine prostate tumour xenograft model. Four-week-old male SCID C.B17 mice 
were procured from a breeding colony at University of Michigan maintained by our 
group. Mice were anaesthetized using 2% isoflurane (inhalation) and 2 x 10° VCaP 
prostate cancer cells suspended in 100 il of PBS with 50% Matrigel (BD Biosciences) 
were implanted subcutaneously into the dorsal flank on both sides of the mice. Once 
the tumours reached a palpable stage (100 mm’), the animals were randomized and 
treated with either 10mgkg ' body weight MDV3100 or 50mgkg’ body weight 
(doses previously used in mouse prostate cancer and multiple myeloma models'”) by 
oral gavage or intraperitonially, respectively, for five days a week. Growth in tumour 
volume was recorded using digital callipers and tumour volumes were estimated using 
the formula (1/6) (L X W7), where L is length of tumour and W is width. Loss of body 
weight during the course of the study was also monitored. At the end of the studies mice 
were killed and tumours extracted and weighed. Additionally, femur bone marrow, 
liver and spleen were harvested to determine spontaneous metastasis by measuring 
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human-Alu sequence. Briefly, genomic DNA from femur bone marrow, liver and 
spleen were prepared using Puregene DNA purification system (Qiagen), followed 
by quantification of human Alu sequence by human Alu-specific Fluorogenic TaqMan 
qPCR probes as described previously*°“’. For CRPC experiment, VCaP tumour- 
bearing mice were castrated when the tumours were approximately 200 mm’ in size 
and once the tumour grew back to the pre-castration size were randomized and treated 
with JQ] or vehicle (D5W) control. All procedures involving mice were approved by 
the University Committee on Use and Care of Animals at the University of Michigan 
and conform to all regulatory standards. 

Prostate histology and hormone measurement. Four-to-five-weeks-old male 
SCID C.B17 mice were administered vehicle, 10 mgkg * MDV3100 or 50mgkg * 
JQ1 by oral gavage or intraperitonially, respectively, for five days a week. Highly 
hormone-responsive seminal vesicles attached to the prostate were harvested from 
mice after 4 weeks of injection. Prostate were fixed in formalin solution and processed 
for sectioning. Standard haematoxylin and eosin staining was performed on the 
formalin-fixed sections, which were used to image the different lobes of the gland. 
To determine testosterone levels, blood samples were collected by cardiac puncture 
from mice anaesthetized with isoflurane. The serum was separated from the blood 
and stored at —80°C until assayed. Serum testosterone levels were measured by 
ligand assay at the University of Michigan-ULAM Pathology Cores for Animal 
Research. 
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Extended Data Figure 1 | BET bromodomain inhibitor JQ1 blocks cell 
growth, induces apoptosis and transcriptionally suppresses anti-apoptotic 
factor BCL-xl without affecting BRD2/3/4 proteins. a, Cell viability curves 
for the six prostate lines treated with JQ1. N = 6 wells of a 96-well plate per 
condition. b, BET bromodomain proteins are ubiquitously expressed in 
prostate cell lines. AR and MYC protein levels are also shown. GAPDH serves 
as a loading control. c, Knockdown of BET bromodomain proteins attenuates 
cell proliferation and invasion. RT-PCR analyses of BRD2, BRD3 or BRD4 in 
VCaP cells transfected with short interfering RNA (siRNA) against their 
respective transcript or non-targeting (NT) siRNA. Data show mean ~ s.e.m. 
(n = 3) from one of three independent experiments. d, VCaP- and LNCaP-cell 
proliferation after indicated gene knockdown. 20,000 cells were seeded in 
24-well plates 24h post-transfection with siRNAs and counted on day 0, 2, 4 
and 6 (n = 3) by Coulter counter. Data show mean + s.e.m. e, VCaP- and 
LNCaP-cell invasion (n = 6) after indicated gene knockdown. JQ1 was used at 
500 nM. f, Cell cycle analysis of JQ1-treated prostate cell lines (after 48h 
treatment with JQ1). Data represent three independent experiments. 

g, Induction of apoptosis as determined by appearance of cleaved PARP 
(cPARP) in VCaP prostate cancer cells by JQ1. GAPDH served as a loading 
control. h, Immunoblot demonstrating an increase in cPARP and decrease in 
BCL-xl in all three AR-positive cell lines compared with AR-negative PC3 cells 
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upon JQ] treatment. i, Relative BCL-xl mRNA levels as determined by TaqMan 
qPCR in JQ1-treated cells. Data show mean = s.e.m. (” = 3) from one of three 
independent experiments. j, ChIP-seq data depicting loss of BRD2/3/4 
recruitment to the BCL-xl promoter upon JQ1 treatment in VCaP cells. The 
genome browser representation of BRD2/3/4 binding events on the BCL-xl 
promoter region. The y-axis denotes reads per million per base pair 

(r.p.m. bp” '), the x-axis denotes the genomic position. The bottom panel 
depicts the H3K27ac mark on the same promoter region in VCaP cells. 

k, Colony formation assays of prostate cell lines. Cells were cultured in the 
presence or absence of 100 and 500 nM of JQ] for 12 days followed by staining 
(top) and quantification (bottom; mean + s.e.m. n = 6). Representative 
photographs of crystal violet stained colonies (except for VCaP) used for 
quantification are shown. 1, BET bromodomain inhibitor JQ1 does not affect its 
target proteins. qRT-PCR analyses of BRD2, BRD3 and BRD4 in prostate 
cancer cell line panel treated with two different concentrations of JQ1 for 24h. 
Data show mean + s.e.m. (n = 3) from one of the three independent 
experiments. m, Immunoblot analysis of BRD proteins in prostate cell line 
panel treated with JQ] for 48hrs. GAPDH serves as a loading control. Asterisks 
in b and m indicate non-specific band. Representative blots shown are from 
triplicate biological experiment. NS, not significant; *P = 0.01; **P = 0.001 by 
two-tailed Student’s t-test. 
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Extended Data Figure 2 | Effect of JQ1 on AR target genes and on MYC 
transcription. a, (RT-PCR analysis of indicated genes in LNCaP and 22RV1 
cells treated with varying concentrations of JQ1 for 24h. Data show 

mean + s.e.m. (1 = 3) from one of two independent experiments. 

b, Immunoblot analysis of AR and PSA ina panel of prostate cancer cells after 
treatment with two different doses of ]Q1. GAPDH serves as a loading control. 
c, ERG and PSA are transcriptional targets of JQ1. Proteasome inhibitor 
bortezomib does not rescue ERG and PSA levels in JQ1-treated VCaP cells. 
Immunoblot analyses of ERG and PSA in VCaP and PSA in LNCaP cells 
treated with JQ1 followed by incubation with bortezomib as indicated. MYC, 
known to be degraded by proteasome, was used as a positive control for 
bortezomib treatment. GAPDH serves as a loading control. d, GSEA showing 
loss of MYC signature (four-gene set) in AR-positive VCaP, LNCaP and 
22RV1 cells but not AR-negative DU145 cells after JQ1 treatment. Size, 
number of genes in each set; NES, normalized enrichment score. P and false 
discovery rate (FDR) q values indicate statistical significance. e, RT-PCR and 
immunoblot analysis of MYC in JQ1-treated prostate cancer cells. Data 
show mean + s.e.m. (n = 3) from one of two independent experiments. 

f, g, Time-course qRT-PCR and immunoblot analysis of MYC in AR-positive 
VCaP, LNCaP and 22RV1 cells after JQ1 treatment. h, Cyclohexamide 


3 5 
Days 


5 
: Days 


(translation inhibitor) treatment does not enhance JQ1-mediated loss of 
MYC protein, ruling out post-translational degradation of MYC by JQ1. 
Time-course immunoblot analysis of MYC in VCaP, LNCaP and 22RV1 cells 
treated with cyclohexamide or cyclohexamide plus JQ] as indicated. 
Representative blots from two independent experiments are shown. 

i, GAPDH-normalized MYC protein levels are shown. Band intensities from 
d were determined by Image] and the plots were generated using GraphPad 
Prism. j, MYC knockdown does not affect cell invasion. Box plot shows 
invasion of VCaP cells transfected with siNT or siMYC. Inset shows the image 
of invaded VCaP cells (n = 6). Right, (RT-PCR of MYC upon siRNA 
transfection. Data show mean + s.e.m. from one of three independent 
experiments. k, Exogenous MYC introduction does not rescue JQ1-mediated 
cell growth inhibition. Cells were infected with control adeno-LacZ or 
adeno-MYC virus. Equal numbers of cells were plated 24h after infection and 
treated with 500 nM JQ1 or I-BET762. Cells were counted (m = 3 wells) and 
plotted; day 0 of drug treatment was set at 100%. Data show mean = s.e.m. 
from one of four independent experiments. 1, Immunblot analysis depicts 
overexpression of MYC in adeno-MYC infected cells on day 0 and day 7 of the 
experiment. GAPDH serves as a loading control. *P = 0.05; **P = 0.005 by 
two-tailed Student’s t-test. 
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Extended Data Figure 3 | Physical association of AR with BRD4 and its 
disruption by BET bromodomain inhibitor. a, LNCaP nuclear extract was 
fractionated on a Superose-6 column and AR, BRD4 and RNA Pol II were 
analysed by immunoblot analysis. b, c, Representative sensorgrams for 
AR-RNF2, RAS-BRD4(BD1-BD2) and RNF2-BRD4(BD1-BD2) 
interactions by an OctetRED biolayer interferometry. Real-time binding was 
measured by immobilizing biotinylated AR, RAS or RNF2 proteins separately 
on a streptavidin biosensor and subsequent interaction with varying 
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concentrations of analyte proteins (RNF2 or BRD4(BD1-BD2)) individually. 
Immobilized RAS or RNF2 biosensors did not bind with BRD4, indicating that 
the AR-BRD4 interaction is specific. Representative sensorgrams from 4-6 
independent experiment are shown. d-f, In vitro binding analysis of AR 

and indicated domains of BRD4. Equal amounts of in vitro translated 
full-length Halo-AR protein and GST-BRD4 domains were combined and 
immunoprecipitated using Halo beads followed by immunoblot analysis with 
anti-GST antibody. g, JQ1 disrupts the endogenous AR-BRD4 interaction. 
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Extended Data Figure 4 | Changes in genome-wide enrichment profiles of 
BRD proteins in response to bromodomain inhibitors. a, Table showing 
high-throughput sequencing read information for ChIP libraries of BRD2, 
BRD3, BRD4, AR, RNA Pol II, ERG, H3K27ac and IgG performed for this 
study. b, ChIP-seq was performed using BRD2, BRD3 and BRD4 antibodies in 
VCaP cells treated with DMSO, JQ1 or I-BET762 for 12 h. Genome-wide 
distribution of BRD2, BRD3 and BRD4 enriched sites. Highly significant peaks 
(see Methods) show relatively high overlap. A large majority of sites are 
occupied by at least two BRD proteins. BRD2 and BRD3 have the most similar 
localization pattern. c, BRD proteins show varying degrees of overlap. Shown is 


the ratio of sites occupied by either protein alone (unique) or co-occupied with 
another BRD-family protein (overlap). BRD4 shows the largest number of 
unique peaks. d, BET inhibitors JQ1 and I-BET762 attenuate recruitment of 
BRD proteins from chromatin. Enrichment levels for each protein were 
normalized to the median enrichment in vehicle-treated cells. BRD2 and BRD3 
proteins show similar responses to both inhibitors, whereas BRD4 is more 
potently evicted by JQ1. e, BET bromodomain inhibitors deplete target 
proteins from genomic regions with or without AR. Mean enrichment levels 
within each subpanel were normalized to the maximum mean enrichment in 
vehicle-treated cells. 
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Extended Data Figure 5 | Influence of JQ1 and anti-androgens on 
genome-wide recruitment of AR and their effect on DHT-induced AR 
target gene expression. a, Two independent biological replicates of AR 
ChIP-seq experiments in VCaP cells show high correlation of normalized 
enrichment levels (see Methods) in the majority of treatment conditions. R? 
values for each biological duplicate are shown. b, Mean enrichment (coverage) 
profiles are similar between biological replicates and different between 
treatment conditions, indicating that no adverse changes in enrichment levels 
are observed between the replicates. c, Bar graph showing total number of AR 
peaks for VCaP-treated cells. The genome-wide individual peaks for AR 
yielded the highest number of peaks for DHT (35,390) whereas vehicle control 
cells showed only 13,874 peaks. However, the number of peaks for AR was 
23,961, 18,264 and 32,212 in the presence of JQ1, MDV3100 and bicalutamide, 
respectively. d, Heat map representation of AR binding peaks in different 
treatment groups. Genomic target regions are rank-ordered based on the level 
of AR enrichment at each androgen response elements (ARE) within —1kb 
and +1 kb flanking the genomic region. e, Venn diagram illustrating the 
overlap of AR-bound genes between different treatment groups. f, AR-BRD4 
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binding on KLK3 and FASN upstream regions. Genome browser 
representation of AR and BRD4 binding events on a putative enhancer and 
super-enhancer of AR-regulated KLK3 and FASN gene, respectively. The y-axis 
denotes reads per million per base pair (r-p.m. bp '), the x-axis denotes the 
genomic position with a scale bar on top right. g, Expression of AR target genes 
in the presence of JQ1, MDV3100 or bicalutamide. Heat maps for VCaP and 
LNCaP cells treated with DHT (10nM), DHT plus JQ1 (0.5 uM), DHT plus 
MDV3100 (10 uM) and DHT plus bicalutamide (25 1M). Red arrows indicate 
well-characterized AR target genes. h, RT-PCR analysis of AR-regulated 
genes in the VCaP and LNCaP treated cells. To directly compare JQ1 and 
MDV3100 in blocking AR signalling, cells were treated with varying 
concentrations of JQ1 or MDV3100 followed by DHT treatment and analysed 
for AR targets. The reduction in DHT-induced gene expression was 
observed for JQ1 even at 100-250 nM whereas MDV3100 showed a marginal 
reduction at 10 1M, demonstrating the higher efficacy of JQ] in blocking AR 
target gene expression. Data show mean + s.e.m. (n = 3) from one of two 
independent experiments. 
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Extended Data Figure 6 | Effect of JQ1 on the TMPRSS2-ERG loci and 
ERG-mediated transcription in VCaP cells. a, Genome browser 
representation of RNA Pol II binding events within the ERG gene body. The 
y-axis denotes reads per million mapped reads per base pair (r.p.m. bp’), 
the x-axis denotes the genomic position and the black arrow indicates the 
region involved in the TMPRSS2-ERG fusion. b, As in a, AR and BRD4 binding 
on the promoter of the ERG 5'-fusion partner TMPRSS2 in VCaP cells. Note 
the reduced RNA Pol II and AR-BRD4 recruitment levels in DHT plus JQ1 
tracks for the ERG gene body and TMPRSS2 promoter respectively. c, High 
reproducibility of ERG ChIP-seq experiments. Biological replicates of ERG 
ChIP-seq experiments show very high correlation of normalized enrichment 
levels (see Methods) in the JQ1- and DMSO-treated conditions. d, Significant 
changes in ERG levels upon JQ1 treatment at ERG-binding sites in the 
proximity of gene loci. Changes in ERG enrichment levels were assessed using 
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DESeq2. Statistically significant differences were observed for ERG gain and 
ERG loss. Significant ERG gains are associated with quantitatively modest 
changes in enrichment level. On the other hand, significant ERG losses are 
associated with greater changes in enrichment levels. Individual number of 
peaks for each panel is shown. e, Genome browser representation of 
ERG-binding events on bona fide ERG-activating target genes. The y-axis 
denotes reads per million per base pair (r.p.m. bp‘), the x-axis denotes 

the genomic position. f, Genome browser representation of ERG-binding 
events on ERG-repressed target genes. g, TaqMan qRT-PCR analysis of 
ERG-activated genes in VCaP cells after JQ1 treatment. h, TaqMan QRT-PCR 
analysis of ERG-repressed genes in VCaP cells after JQ1 treatment. Data 
represent mean + standard deviation (s.d.) (n = 3) from one of two 
independent experiments. *P = 0.05; **P = 0.005, ***P = 0.0005 by 
two-tailed Student's f-test. 
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Extended Data Figure 7 | BET bromodomain inhibitors reverse 
ERG-mediated functions in an isogenic cell line system. a, b, RT-PCR and 
immunoblot showing overexpression of ERG in RWPE and PC3 prostate 
cell lines. Data represent mean + s.e.m. (n = 3). ¢, BET inhibitors block 
ERG-induced RWPE and PC3 cell invasion. RWPE and PC3 cells stably 
expressing either LacZ or ERG were treated with DMSO (n = 4), 500nM JQ1 
(n = 4) or I-BET762 (n = 4) for 24h before plating in Matrigel-coated Boyden 
chambers. After 48 h cell invasion was quantified. Left, representative 
photomicrographs of invaded cells are shown with a 100 [um scale bar 

(lower Boyden chamber stained with crystal violet). Right, bar graph shows 


NES 0.736 
q= 0.880 


NES 0.966 
q= 0.504 


Enrichment Score 
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q= 0.583 


Enrichment Score 


MN 


fold change in cell invasion, with DMSO-treated LacZ-expressing cells set to 1. 
Data represent mean + s.e.m. from one of three independent experiments. 

d, BET inhibitors reverse ERG-induced gene transcription. GSEA of the ERG 
target gene signature (see Methods) in RWPE cells overexpressing ERG 
(RWPE-ERG) and PC3-ERG cells treated with JQ1 or I-BET762 (500 nM) 
for 24h. ERG-induced genes are repressed by JQ] or I-BET762 treatment. 

e, GSEA using a random gene set shows no significant positive or negative 
enrichment by JQ] or I-BET762 treatment in RWPE-ERG and PC3-ERG cells. 
NS, not significant; ***P = 0.0001 by two-tailed Student’s t-test. 
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Extended Data Figure 8 | JQ1 inhibits ETS (ERG/ETV1) factors that 
regulate MYC expression in VCaP and LNCaP cells. a, Genome browser 
representation of ERG- and ETV1-binding events on the MYC distal 
enhancer*. JQ1 treatment in VCaP cells reduces ERG enrichment, as shown in 
two independent ERG ChIP-seq experiments. The y axis denotes reads per 
million per base pair (r.p.m. bp’ ’), the x axis denotes the genomic position. 
LNCaP ETV1 ChIP-seq data are based on data from ref. 23 (GEO accession 
code GSM1145322), and show ETV1 recruitment to the MYC distal enhancer. 
b, ChIP-PCR validation of loss of ERG recruitment after JQ1 treatment in 
VCaP cells. Data show mean = s.d. (n = 3) from one of two independent 
experiments. c, d, Knock-down of AR or ETS factor reduces MYC gene 
expression in VCaP and LNCaP cells. (RT-PCR for AR, ETS and MYC 
expression in siNT, siAR or siETS transfected cells. Data show mean = s.d. 
(n = 3) from one of two independent experiments. e, A cartoon illustrating the 
mechanism of MYC loss by JQ1 in AR-positive VCaP and LNCaP cells. f, Anti- 
androgens but not JQ1 de-repress MYC expression in prostate cancer cells. 
Genome browser representation of AR and RNA Pol II binding events within 
the MYC gene locus. The y axis denotes reads per million per base pair 


(r._p.m. bp '), the x axis denotes the genomic position. Note the AR 
recruitment to the same distal enhancer that is occupied by ERG (see Extended 
Data Fig. 8a), indicating that there is competition between the AR and ETS 
factors to bind to this enhancer region to regulate MYC gene expression. 

g, Heat map showing MYC expression values from VCaP microarray gene 
expression data. h, Anti-androgen restores DHT-repressed MYC expression in 
VCaP cells. (RT-PCR of MYC in VCaP cells treated with vehicle, DHT 

(10 nM), DHT plus JQ1 (500 nM), DHT plus MDV3100 (10 1M) or DHT plus 
bicalutamide (25 1M). Inability of JQ1 to de-repress MYC in this setting could 
be explained by the fact that both AR and ERG are de-recruited from the MYC 
distal enhancer, leading to net loss of MYC expression. i, MDV3100 and not 
JQ1 restores DHT-repressed MYC protein levels in VCaP cells. Immunoblot of 
MYC protein in VCaP cells pre-treated with vehicle, MDV3100 (10 1M) or 
JQ1 (500 nM) for 4h followed by DHT (10 nM) for 20h. Data show 

mean + s.d. (m = 3) from one of two independent experiments. NS, not 
significant; *P = 0.01; **P = 0.001; **P = 0.0001 by two-tailed Student’s 
t-test. 
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Extended Data Figure 9 | JQ1 does not affect normal prostate growth and 
testosterone levels but reduces testis size in mice. a, Comparison of JQ1 and 
MDV3100 treatment on VCaP cell viability in vitro. N = 8 wells of a 96-well 
plate per condition. VCaP cells were treated with MDV3100 or JQ] for 8 days 
and assayed for viability with Cell-titerGLO. b, Gross images showing 

highly hormone-responsive seminal vesicles attached to prostate gland 

(red and black arrows, respectively) from male mice treated for 30 days with 
vehicle, JQ1(50 mg kg™ 1) or MDV3100 (10 mgkg 1) Vehicle or JQ1-treated 
mice show no change in the appearance of seminal vesicles. By contrast, 
MDV3100-treated animals show remarkable shrinkage of seminal vesicles. 


c, Mice treated with JQ] do not show any adverse changes to anterior or ventral 
prostate morphology. The haematoxylin and eosin images show normal 
morphology of anterior and ventral prostate from vehicle- or JQ1-treated 
mice. MDV3100-treated mice show attenuated remnant glands of anterior or 
ventral prostate. d, Male mice (n = 3 per group) treated with vehicle or JQ1 
for 30 days exhibit similar serum testosterone levels. Data represent the 
mean ~ s.e.m. e, Gross analysis of testis from mice treated with vehicle or JQ1 
for 30 days. f, Testis weight from vehicle control or JQ1-treated mice. Data 
represent the mean ~ s.e.m. from n = 7 mice per group. NS, not significant; 
*P = 0.0001 by two-tailed Student’s t-test. 
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Extended Data Figure 10 | In vivo effects of BET bromodomain inhibition 
in VCaP xenograft model. a, VCaP cells were implanted subcutaneously in 
mice and grown until tumours reached a size of approximately 100 mm’. 
Xenografted mice were randomized and then received vehicle, 50mgkg 'JQ1 
or 10mgkg ’ MDV3100 5 days a week as indicated. Calliper measurements 
were taken twice a week. Individual tumour volumes from different treatment 
groups at the end of the experiments with P values are shown. b, MDV3100 
treatment leads to spontaneous metastasis. Mice bearing VCaP xenografts 
(subcutaneously engrafted) treated with vehicle (n = 6) or MDV3100 (n = 6) 
were assessed for spontaneous metastasis to the femur (bone marrow) and soft 
tissues such as liver and spleen. Genomic DNA isolated from these sites was 
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analysed for metastasized cells by measuring human Alu sequence (by 
Alu-qPCR). MDV3100-treated mice showed spontaneous metastasis to femur 
and liver. Spleen did not show presence of human ALU sequences. ¢, As in 
a, for mice bearing VCaP xenografts treated with vehicle (n = 6), JQ1 (n = 6) 
or MDV3100 (n = 6). MDV3100-treated but not JQ1-treated mice showed 
metastasis to femur and liver. d, JQ1 or MDV3100 treatment does not affect 
animal weight. Mice from VCaP cell xenograft experiments treated with 
vehicle, 10 mgkg' MDV3100 or 50mgkg~ ' JQ1 were weighed at the time 
of calliper measurements. e, Individual tumour volume for vehicle- or 
JQ1-treated VCaP mouse xenograft (for data shown in Fig. 4c). Mean + s.e.m. 
is plotted. Statistical significance was determined by two-tailed Student’s t-test. 
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SMYD3 links lysine methylation of MAP3K2 to 


Ras-driven cancer 
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Deregulation of lysine methylation signalling has emerged as a com- 
mon aetiological factor in cancer pathogenesis, with inhibitors of 
several histone lysine methyltransferases (KMTs) being developed 
as chemotherapeutics’. The largely cytoplasmic KMT SMYD3 (SET 
and MYND domain containing protein 3) is overexpressed in nu- 
merous human tumours” *. However, the molecular mechanism by 
which SMYD3 regulates cancer pathways and its relationship to tumor- 
igenesis in vivo are largely unknown. Here we show that methylation 
of MAP3K2 by SMYD3 increases MAP kinase signalling and pro- 
motes the formation of Ras-driven carcinomas. Using mouse models 
for pancreatic ductal adenocarcinoma and lung adenocarcinoma, we 
found that abrogating SMYD3 catalytic activity inhibits tumour devel- 
opment in response to oncogenic Ras. We used protein array technol- 
ogy to identify the MAP3K2 kinase as a target of SMYD3. In cancer 
cell lines, SMYD3-mediated methylation of MAP3K2 at lysine 260 
potentiates activation of the Ras/Raf/MEK/ERK signalling module 
and SMYD3 depletion synergizes with a MEK inhibitor to block Ras- 
driven tumorigenesis. Finally, the PP2A phosphatase complex, a key 
negative regulator of the MAP kinase pathway, binds to MAP3K2 
and this interaction is blocked by methylation. Together, our results 
elucidate a new role for lysine methylation in integrating cytoplas- 
mic kinase-signalling cascades and establish a pivotal role for SMYD3 
in the regulation of oncogenic Ras signalling. 

The Ras family of oncogenes is activated in a large fraction ofhuman 
cancers’. To explore possible connections between KMTs and human 
cancers driven by activation of Ras, we surveyed the level of expression 
for 54 known and candidate human KMT genes in pancreas ductal ade- 
nocarcinoma (PDAC), a cancer nearly universally initiated by oncogenic 
Ras mutations. We found that five KMT-encoding genes (SMYD3, MLL5 
(also known as KMT2E), EZH2, SETD5 and WHSCI1L1) were consis- 
tently upregulated in human data sets (Extended Data Fig. la-c). SMYD3, 
which showed the most significant correlation with PDAC in our meta- 
analysis, is reported to be overexpressed in several cancers with elevated 
Ras activity’, and SMYD3 protein expression increases with PDAC devel- 
opment (Extended Data Fig. 2a, b). On the basis of these results, we 
postulated that SMYD3 might have a role in Ras-driven tumorigenesis. 

Little is known regarding SMYD3 cancer-related function in vivo. We 
previously generated Smyd3 mutant mice, which develop normally, and 
are viable and fertile (Extended Data Fig. 2d; data not shown; ref. 6). To 
investigate the role of SMYD3 in Ras-driven cancers, Smyd3 mutant 
mice were crossed with mice harbouring a loxP-Stop-loxP KrasG12D 
knock-in allele (Kras*/"@”"), which allows for the controlled induc- 
tion of oncogenic K-Ras and the initiation of tumours in distinct organs, 
including the lung and the pancreas”*. PDAC is thought to arise from 
the transdifferentiation of acinar cells into duct-like cells upon activation 


9,10 


of Ras signalling”’’. We observed induction of SMYD3 expression dur- 
ing this process in mice with pancreas-specific activation of K-Ras 
(Fig. 1a) and in an ex vivo acinar-to-ductal metaplasia (ADM) assay"? 
(Fig. 1b). In this assay, SMYD3 was required for efficient duct forma- 
tion from acinar cells (Fig. 1c, d). In vivo, ADM and PDAC initiation 
are triggered in young Kras mutant mice by inducing severe acute pan- 
creatitis via repeated injections of caerulein’’ (Fig. le). In this system, 
Smyd3 deletion reduced the appearance of pancreatic intra-epithelial 
neoplasia (PanIN) brought on by Kras activation, as determined by his- 
topathological analysis and decreased signal for both phosphorylated 
ERK1/2 (pERK1/2, a downstream marker of Ras activity) and MUCS5 
(a marker of PanINs) (Fig. 1f, g). In the absence ofa pancreatitis trigger, 
PanIN lesions develop by 6 months in p4s*’ Cre. Rrast SE-GI2D mice’, 
a process that was attenuated by Smyd3 loss (Fig. 1h; Extended Data 
Fig. 3a). Next, to study PDAC growth and to perform survival studies, 
we used the p4s*’ Cre. Kras*/ Oe yaa lox (Kras;p53) mutant model 
(p48 and p53 are also known as Ptfla and Trp53, respectively), which is 
characterized by rapid PanIN-to-PDAC progression and malignant trans- 
formation with 100% penetrance in a relatively short latency (~50- 
60 days)'*. At autopsy, the pancreatic tissue from Kras;p53 mutant mice 
was entirely occupied by transformed cells, whereas areas of normal 
pancreatic tissue remained in Kras;p53;Smyd3 mutant mice (Extended 
Data Fig. 3b, c). Furthermore, loss of Smyd3 extended the lifespan of 
the animals (Fig. 1i) and resulted in reduced levels of the PDAC bio- 
marker pERK1/2 in biopsy samples (Fig. 1j; Extended Data Fig. 3b). 
Notably, K-Ras expression was not affected by SMYD3 deletion (Ex- 
tended Data Fig. 3f). Based on these data, we conclude that SMYD3 is 
required for efficient initiation of pancreatic cancer by oncogenic K-Ras. 

Oncogenic activation of the Ras pathway is a frequent event in lung 
adenocarcinoma, a cancer that also shows high SMYD3 expression (Ex- 
tended Data Figs 1d and 2c). Intratracheal injection of an adenovirus 
expressing the Cre recombinase (Ad-Cre) in adult Kras*5!-G!2P mice 
led to the development of atypical adenomatous hyperplasia (AAH) and 
adenomas in the lungs within 12 weeks’, irrespective of Smyd3 status 
(Fig. 2a, b; Extended Data Fig. 3d). In contrast, at 16 and 20 weeks or 
more after Ad-Cre infection, mice lacking Smyd3 showed significantly 
smaller and less advanced tumours than control mice (Fig. 2a, c, d; 
Extended Data Fig. 3d; data not shown). Specifically, quantification of 
tumour grade indicated that Smyd3 loss impeded the critical transition 
from adenoma to adenocarcinoma (Fig. 2c), which was also observable 
at the whole-organ level (Fig. 2d). Moreover, the lifespan of Kras°!7?- 
expressing mice was 20% longer if they were mutant for Smyd3 (Fig. 2e). 
Progression of lung cancer to carcinoma correlates with amplification 
of Ras/MEK/ERK signalling'*’°. Smyd3 deletion resulted in lower detec- 
tion of pERK1/2 relative to control tumours without an overall change 
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Figure 1 | SMYD3 loss inhibits Ras-driven pancreatic tumorigenesis. 

a, Representative immunohistochemistry (IHC) images showing SMYD3 
expression in cells undergoing acinar-to-ductal metaplasia (ADM, arrowheads) 
but not in acini (asterisk) in pase; sKras ‘SSP (Kras) mice. b, Smyd3 
expression increases during ADM formation. Quantitative PCR with reverse 
transcription (qRT-PCR) analysis of Smyd3 expression at the indicated times 
from control- and EGF-induced ADM ex vivo samples (four independent 
biological replicas). c, SMYD3 depletion inhibits ADM. Wild-type (WT, 
pase” *) acinar clusters (asterisk) undergo ADM and form ducts (arrowhead) 
ex vivo, whereas Smyd3 mutant acini explants inefficiently form ducts. 

d, Quantification of acinar and ductal clusters on day 3 of culture as 

in c (four independent biological replicas with three technical replicas each). 
e, Schematic of the caerulein pancreatitis-induced tumorigenesis protocol’’. 

f, Representative haematoxylin and eosin (HE) staining and IHC for pERK1/2, 
a marker of Ras activity, and MUCS, a marker of PanIN lesions (arrowheads). 
g, Quantification of MUCS5-positive lesions in caerulein-treated pancreata from 
Kras (n = 6) and Kras;Smyd3 (n = 6) mutant mice. h, Quantification of 
spontaneous PanIN lesions formed in 6-month-old Kras (n = 8) and 
Kras;Smyd3 (n = 8) mutant mice. The grade of lesions is indicated. i, Ka = a 
Meier survival of Kras;p53 mutant mice (p480""; Kras’'"G170/*; pasion 

n = 33, median survival = 56 days) and Kras;p53;Smyd3 mutant mice (n = 21, 
median survival = 68.5 days) animals. P = 0.0005 by log-rank test for 
significance. j, Immunoblots with the indicated antibodies of Kras;p53 and 
Kras;p53;Smyd3 mutant pancreatic tumour lysates. Loss of SMYD3 was also 
confirmed by immunostaining of pancreatic sections (Extended Data Fig. 4d). 
All scale bars, 50 jum. *P value < 0.05; ** P value < 0.01; n.s., not significant 
(two-tailed unpaired Student's t-test). Data are represented as mean + s.e.m. 


in total levels of Ras (Fig. 2f; Extended Data Figs 2a and 3f). Together, 
these observations indicate that SMYD3 promotes Ras-driven cancer 
development and progression in vivo. 

Depletion of SMYD3 by RNA-mediated interference (RNAi) using 
a short hairpin RNA (shRNA) strategy in LKR10 mouse cells (a LAC- 
derived cell line'®), A549 (a human LAC cell line) and CFPacl (ahuman 
PDAC cell line) reduced the proliferation rates for all three cell types, 
and inhibited their ability to grow in anchorage-independent condition 
(Extended Data Fig. 4a—c). Furthermore, knockdown of SMYD3 in CFPacl 
cells inhibited tumour growth in mouse xenograft experiments (Extended 
Data Fig. 4d—f). Thus, SMYD3 acts to maintain a number of tumorigenic 
characteristics in mouse and human cancer cell lines driven by onco- 
genic Ras. 

Next, wild-type SMYD3, a catalytically inactive form (SMYD3(F183A))°*, 
or vector control were co-expressed with the Cre recombinase in the 
lungs of Kras;Smyd3 mutant mice by lentiviral transduction (Extended 
Data Fig. 5a). Complementation of wild-type SMYD3 into the lungs of 
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Figure 2 | SMYD3 loss inhibits the development of Ras-driven lung 
adenocarcinoma. a, Quantification of tumour area per lung at the indicated 
time points after tumour induction (n = 6 for each time point and genotype). 
b, Total number of tumour lesions at 12 weeks post tumour induction (n = 6 
for each genotype). Data are represented as mean + s.e.m. *P value < 0.05; 
**P value < 0.01; n.s., not significant (two-tailed unpaired Student’s t-test). 

c, Quantification of tumour grade at 16 and 20 weeks (n = 6 for each time point 
and genotype). d, Representative lung images at the endpoint of survival 
studies. Scale bars, 1 cm. e, Survival analysis of Kras (n = 16, median 

survival = 144.5 days) and Kras;Smyd3 (n = 21; median survival = 174 days) 
mutant mice, time post infection. P< 0.0001 by log-rank test for significance. 
f, Immunoblots of lung tumour lysates dissected from Kras and Kras;Smyd3 
mutant mice (two independent biological replicates for each genotype) with 
the indicated antibodies. Loss of SMYD3 was also confirmed by 
immunostaining on lung sections (Extended Data Fig. 3f). 


Kras;Smyd3 mutant mice resulted in a higher tumour burden and pERK1/2 
signal relative to the control Cre-alone infection and expression of 
mutant SMYD3(F183A) (Fig. 3a—c). Reconstitution experiments also 
demonstrated that SMYD3 catalytic activity is required for pancreatic 
ADM (Extended Data Fig. 5b, c). We previously reported that SMYD3 
methylates histone H4 at lysine 5 (H4K5) and not at other lysines on 
histones®. However, in LAC and PDAC cells and tumours, virtually all 
SMYD3 is present in the cytoplasm (Fig. 1a and Extended Data Figs 2a, 
c, 3e and 6b), indicating that the cancer-relevant substrate in these cell 
types is unlikely to be nuclear H4K5 but rather a cytoplasmic protein. 

In a biochemical screen for SMYD3 targets on a protein array plat- 
form containing more than 9,000 potential substrates’”, the only can- 
didate to be methylated by SMYD3 in three independent experiments 
was the MAP kinase pathway component MAP3K2 (Extended Data 
Fig. 6a). As shown in Fig. 3d, recombinant wild-type SMYD3, but not 
the catalytically dead SMYD3(F183A) mutant, methylated recombin- 
ant MAP3K2 in vitro. Using a mutagenesis approach, we identified lysine 
260 of MAP3K2 as the single site of methylation catalysed by SMYD3 
(Fig. 3e; data not shown). The immediate sequence surrounding K260 
of MAP3K2 and K5 of H4 is identical (GKGG), although the catalytic 
efficiency (Kcat/Ky,) of SMYD3 for MAP3K2 is nearly two orders of mag- 
nitude greater than it is for H4 (Extended Data Fig. 6c). We also did 
not detect any methylation of H3 at lysine 4 by SMYD3, a previously 
reported activity’ (Extended Data Fig. 6d, e). In addition, SMYD3 was 
the only KMT of the eleven we tested that could methylate MAP3K2 
(Fig. 3f, g; Extended Data Fig. 6f). Furthermore, whereas SMYD3 meth- 
ylated MAP3K2, it had no detectable activity on a dozen other mem- 
bers of the MAP kinase signalling cascade (Fig. 3h). Finally, in vitro 
methylation assays on MAP3K2 peptides spanning K260 (amino acids 
249 to 273) with K260 either unmethylated, mono-, di- or tri-methylated 
showed that SMYD3 can use all lower states of methylation as substrates 
to generate the fully saturated trimethyl state at K260 (Extended Data 
Fig. 6g, h). Thus, SMYD3 mono-, di- and tri-methylates MAP3K2 at 
lysine 260 (MAP3K2-K260me) with high specificity in vitro. 

To investigate MAP3K2 methylation in cells, we raised methyl-specific 
antibodies against the different states of methylation at K260 (Extended 
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Figure 3 | SMYD3 methylates MAP3K2 in cancer cells. a, Analysis of 
lung cancer development in Kras;Smyd3 mutant mice following infection 
with a lentivirus expressing Cre only or simultaneously Cre and WT SMYD3 
or inactive SMYD3(F183A). Histological analysis (HE staining) and 

THC for pERK1/2 were performed 24 weeks post lentiviral infection. IHC 
confirms lentiviral-mediated expression of SMYD3. Scale bars, 50 pm. 

b, c, Quantification of total tumour area per lung and pERK1/2 positive area 
per lung, respectively (n = 4 for each experimental group). Data are 
represented as mean + s.e.m. *P value < 0.05 (two-tailed unpaired Student’s 
t-test). d, SMYD3 directly methylates MAP3K2. In vitro methylation assay 
on full-length recombinant MAP3K2 with recombinant wild-type SMYD3, 
catalytic-dead SMYD3(F183A), or glutathione-S-transferase (GST) control. 
Top panel, autoradiogram of methylation assay. Bottom panel, Coomassie 
stain of proteins in the reaction. e, SMYD3 methylates MAP3K2 at K260. 

In vitro methylation assay as in d with the indicated proteins on MAP3K2 
amino acids 1-350, MAP3K2 amino acids 1-350 with a K260A substitution, 
and MAP3K2 amino acids 351-619. Arrow indicates GST, which is a stable 
breakdown product of recombinant proteins. f, g, MAP3K2 is a specific 
substrate of SMYD3. In vitro methylation assays as in d on MAP3K2 using the 
indicated KMTs (positive controls for the known KMTs shown in Extended 
Data Fig. 6f). h, SMYD3 specifically methylates MAP3K2. In vitro SMYD3 
methylation assay as in d on the indicated MAP kinase pathway proteins. 
Asterisk indicates MAP3K2 breakdown product. Arrow indicates GST, 
which is a stable breakdown product for many of the screened substrates. 

i, Immunoblots with the indicated antibodies from input (cytoplasmic extract) 
or the indicated IPs (immunoprecipitations) from LKR10 cells stably 
expressing control or Smyd3 shRNA. j, Immunoblots with the indicated 
antibodies and samples as in i of lung tumour biopsy lysates isolated from Kras 
and Kras;Smyd3 mutant mice. Asterisks represent detection of IgG. For 
experiments e-k representative data based on three or more independent 
biological replica are shown. 


Data Fig. 6i). In co-transfection experiments in human 293T cells, over- 
expressed MAP3K2 was methylated at K260 upon SMYD3 overexpres- 
sion (Extended Data Fig. 6j). Endogenous methylation at MAP3K2-K260 
was observed in LKR10 cells and RNAi-mediated depletion of SMYD3 
in these cells resulted in loss of this signal (Fig. 3i). Finally, the MAP3K2- 
K260me2/3 signal was significantly reduced in tumour tissue micro- 
dissected from Kras versus Kras;Smyd3 mutant mice (Fig. 3j). Thus, 
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SMYD3 is required for maintenance of physiological levels of MAP3K2- 
K260 methylation in cancer tissue and cells. 

The cytoplasmic kinase MAP3K2 is activated in response to a vari- 
ety of stress and mitogenic stimuli, including epidermal growth factor 
(EGF), and relays signals to downstream MAP kinase components such 
as ERK1/2'* and ERK5". Given that the pERK1/2 and pERKS signals 
are reduced in LAC and PDAC samples from Smyd3 mutant mice 
(Figs 1 and 2; Extended Data Fig. 3), we reasoned that SMYD3-mediated 
methylation of MAP3K2 may regulate signalling within the Ras/ERK 
pathway. We therefore examined the relationship between SMYD3 and 
EGF-stimulated ERK1/2 activation. First, endogenous SMYD3 was de- 
pleted in LKR10 cells by shRNA targeting the 3’ untranslated region of 
Smyd3 and then we reconstituted the depleted cells with RNAi-resistant 
wild-type SMYD3 or catalytically inactive SMYD3(F1834A). In control 
cells, EGF treatment triggered ERK1/2 phosphorylation, and this res- 
ponse was greatly reduced by SMYD3 depletion (Fig. 4a). Complemen- 
tation with wild-type SMYD3 re-established the EGF-mediated ERK1/2 
phosphorylation response, whereas complementation with SMYD3(F183A) 
failed to do so (Fig. 4a). SMYD3 was also required when serum was used 
to activate ERK1/2 activation in LKR10 cells (Extended Data Fig. 7b) 
and when EGF was used to activate ERK1/2 in human lung and pan- 
creatic cancer cells (Extended Data Fig. 7c, d). Finally, we established a 
MAP3K2 complementation system to investigate the role of K260 of 
MAP3Kz2. In these experiments, wild-type MAP3K2, but not the SMYD3- 
resistant MAP3K2(K260A) mutant, reconstituted the EGF-mediated 
ERK1/2 phosphorylation response (Fig. 4b). 

To characterize how the SMYD3-MAP3K2 axis impinges on the over- 
all MAP kinase network, the level of EGF-induced activation for several 
kinases was determined in control, SMYD3 knockdown and MAP3K2 
knockdown LKR10, A549 and CFPacl cells (Fig. 4c; Extended Data 
Fig. 7a-d). SMYD3 and MAP3K2 were both required for full activa- 
tion of ERK5, ERK1/2, MEK1/2 and JNK (known downstream targets 
of MAP3K2; refs 18-22), but dispensable for activation of AKT and 
CRaf (RAF-1) (p38 was not activated in the cell lines tested; Extended 
Data Fig. 7a—d). Notably, MEK1/2 activation was impaired in SMYD3 
and MAP3K2 knockdown cell lines. As the canonical ERK1/2 activa- 
tion pathway consists of Raf-MEK1/2-ERK1/2 and both SMYD3 and 
MAP3K2 were required for MEK1/2 and ERK1/2 activation but not 
CRaf, we postulated that SMYD3 methylation of MAP3K2 directly in- 
fluences MEK1/2 phosphorylation. Indeed, pMEK1/2 signal increased 
in response to overexpression of wild-type MAP3K2 and a SMYD3- 
resistant MAP3K2(K260A) mutant, but nota catalytically dead MAP3K2 
(K385M) mutant”? (Fig. 4d, e; Extended Data Fig. 7e). SMYD3 express- 
ion alone had no effect on MEK1/2 activation (Extended Data Fig. 7e), 
but resulted in increased MEK1/2 phosphorylation when co-expressed 
with wild-type MAP3K2 relative to co-expression with MAP3K2(K260A) 
(Fig. 4d, e). Notably, MAP3K2 phosphorylates MEK] in in vitro kinase 
assays”', and this activity was unchanged if SMYD3-methylated MAP3K2 
or MAP3K2(K260A) were used in the kinase assays rather than wild-type, 
unmethylated MAP3K2 (Extended Data Fig. 7f). These results indicate 
that the molecular mode of action linking MAP3K2 methylation to 
MEK 1/2 activation is not due to changes in the intrinsic kinase activity 
of MAP3K2 but rather another mechanism. 

Given the role of SMYD3-MAP3K2 in activating MEK1/2, we tested 
whether SMYD3 depletion augments the effects of the MEK1/2 inhib- 
itor trametinib, which is currently being investigated to treat NSCLC and 
PDAC (http://clinicaltrials.gow/). Administration of Kras and Kras;Smyd3 
mutant mice with a normal dose of trametinib blocked tumorigenesis 
in both strains, although phosphorylation of ERK1/2 was still lower in 
mice depleted of SMYD3 (Extended Data Fig. 8). Notably, a low-dose 
trametinib regimen, which only partially inhibited pERK1/2 levels and 
the formation of neoplastic lesions in Kras mutant mice, was sufficient 
to block tumorigenesis and ERK1/2 activation in Smyd3 knockouts (Ex- 
tended Data Fig. 8). Trametinib was also more potent in cancer cell lines 
when coupled with SMYD3 depletion (Extended Data Fig. 9a). These 
data indicate that SMYD3 may act in concert with MEK1/2 signalling 
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in tumorigenesis. Indeed, overexpression of activated MEK1(S218D, 
S222D) (MEK1-DD)” rescued the effects of SMYD3 depletion in lung 
cancer cells (Extended Data Fig. 9b) 

Because the intrinsic kinase activity of MAP3K2 is not directly altered 
by methylation, we postulated that this modification event was involved 
in modulating a key protein-protein interaction. To identify candidate 
methyl-sensitive binding partners of MAP3K2, a SILAC (stable isotope 
labelling by amino acids in cell culture)-based quantitative proteomic 
screen was performed with cytoplasmic extracts to isolate proteins that 
bound differentially to MAP3K2-K260me0 peptides versus MAP3K2- 
K260me3 peptides. Although this analysis did not show enrichment of 
any K260me3-binding proteins, it did reveal six candidates that bind 
to the unmethylated peptide but are blocked by K260 trimethylation. 
Of these six proteins, three are members of the PP2A serine/threonine 
phosphatase complex (Fig. 4f). In our proteomics experiment we iden- 
tified the catalytic PPP2CA protein, the structural PPP2R1A protein and 
the regulatory PPP2R2A protein, three classes of subunits that comprise 
the typical heterotrimeric PP2A complex. This complex is a major cellular 
phosphatase that inactivates key members of the MAP kinase signalling 
cascade (reviewed in refs 25, 26). We found that the interaction between 
the PP2A complex and MAP3K2 is direct, as recombinant PPP2R2A, 
but not PPP2R1A, bound to MAP3K2-K260me0 peptides and not to 
MAP3K2-K260me3 peptides in in vitro peptide pull-down experiments 
(Fig. 4g; top panel). Moreover, PPP2R2A was specifically recovered 
from cytoplasmic extracts using MAP3K2-K260me0 peptides but not 
by MAP3K2-K260me3 peptides (Fig. 4g; middle panel). Thus, amino 
acids 249-273 of MAP3K2 are sufficient for binding directly to the PP2A 
complex, via PPP2R2A, and methylation at K260 inhibits this interaction. 

We next tested the ability of the PP2A inhibitor cantharidin” to ‘phe- 
nocopy SMYD3 function (Fig. 4h; Extended Data Fig. 10a—d). Canthar- 
idin treatment had no gross effect on tumour formation in p48*/“" Kras 
mutant mice relative to vehicle treatment; in contrast, administration 
of cantharidin to p48*/“ Kras;Smyd3 mutant mice restored tumour 
formation to the level seen with wild-type SMYD3 in the p48°’“" Kras 
mutant mice. These data suggest an in vivo functional connection bet- 
ween the Ras pathway, SMYD3 and PP2A. 

We have identified SMYD3-catalysed methylation of MAP3K2 as a 
key event regulating Ras signalling in cancer cells. Although MAP3K2 
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Figure 4 | SMYD3 methylation of MAP3K2 activates MAP kinase signalling 
pathways and repels PP2A. a, SMYD3 catalytic activity is required for ERK1/2 
activation in LAC cells. Immunoblots with the indicated antibodies of 
LKR10 cells lysates depleted for SMYD3 and reconstituted with either active or 
inactive Flag-SMYD3 as indicated. Stimulation, EGF treatment for 15 min at 
25ng ul’. b, MAP3K2 methylation is required for ERK1/2 activation in 
LAC cells. Immunoblot with the indicated antibodies of LKR10 cells lysates 
treated as in a depleted for MAP3K2 and reconstituted with either wild-type or 
K260A mutant Flag-MAP3K2. c, SMYD3 and MAP3K2 regulate multiple 
overlapping MAP kinase pathway proteins. Quantification of the indicated 
activated kinase signals in shSMYD3 and shMAP3K2 relative to shControl cell 
lysates based on three independent biological replica treated as in Extended 
Data Fig. 7a. d, SMYD3 catalytic activity promotes MAP3K2-induced 
phosphorylation of MEK1/2. Immunoblots with the indicated antibodies of 
293T cells lysates transfected with Flag-MAP3K2 and haemagglutinin 
(HA)-SMYD3 wild-type and derivatives as indicated. MAP3K2(K385) is a 
kinase-dead mutant. e, Quantification of pMEK and pERK1/2 signals in 293T 
cells transfected with the indicated MAP3K2 and SMYD3 constructs and 
treated as in d. Data were generated from three independent biological replica. 
f, SILAC-based quantitative proteomic analysis of proteins that bind to 
MAP3K2-K260me0 and MAP3K2-K260me3 peptides. Data represent two 
independent experiments (forward and reverse direction). Proteins are plotted 
by their SILAC ratios in the forward (x axis) and reverse (y axis) SILAC 
experiments. Specific interactors of K260me0 reside in the lower left quadrant. 
The three PP2A complex components are highlighted in blue. L/H, light over 
heavy fraction ratio. g, PP2R2A directly binds to MAP3K2 peptides 
encompassing amino acids 249-273 and this interaction is inhibited by K260 
methylation. Immunoblots of peptide pull-downs as indicated with either 
recombinant proteins (top panel) or HeLa cytoplasmic extracts (middle panel) 
(based on two replica). The dot blot in the bottom panel shows equivalent 
amounts of peptides used for the experiments. h, Quantification of 
MUCS5-positive lesions in caerulein-treated pancreata from Kras (n = 5, each 
treatment) and Kras;Smyd3 (n = 5, each treatment) mutant mice treated 
with the PP2A inhibitor cantharidin (iPP2A) (0.15 mgkg ' twice a day, 
intraperitoneally) or vehicle control (see Extended Data Fig. 10). 

**P value < 0.01; ***P value < 0.001; n.s., not significant (two-tailed unpaired 
Student’s t-test). Data are represented as mean + s.e.m. 


was shown to phosphorylate MEK1/2 directly, a complete understand- 
ing of how MAP3K2 functions in Ras signalling remains to be deter- 
mined. Our data suggest a model in which increased SMYD3 activity 
generates a population of methylated MAP3K2, which—via mechan- 
isms such as blocking the association of the PP2A phosphatase with 
components of the MAP kinase network—intensifies the output of this 
pathway in response to oncogenic Ras (Extended Data Fig. 10e). In addi- 
tion, the cytoplasmic localization of the SMYD3-MAP3K2 dynamic 
suggests a paradigm for how signalling through lysine methylation and 
phosphorylation can be integrated to regulate key signal transduction 
cascades. A clinical implication of this work is the identification of 
SMYD3 as a candidate therapeutic target for pharmacologic interven- 
tion to treat pancreatic and lung cancers, as well as potentially other Ras- 
driven tumours. The complete loss of SMYD3 function has no visible 
phenotype in mice, suggesting that SMYD3 inhibitors would have min- 
imal collateral toxicity as chemotherapeutics. Thus, one could envision 
a therapeutic strategy comprising inhibitors of Raf or MEK that are cur- 
rently used in the clinic with a SMYD3 inhibitory agent, which could 
mitigate potential drug toxicity by lowering the overall dose needed for 
each medicine and combat the development of resistance. Together, our 
findings reveal a new function for lysine methylation signalling in the 
cytoplasm in the regulation of cancer pathways. 


METHODS SUMMARY 


Animal studies were performed according to practices prescribed by the NIH at 
Stanford’s Research Animal Facility accredited by the Association for Assessment 
and Accreditation of Laboratory Animal Care. The Smyd3 mutant mice used in lung 
and pancreatic cancer experiments were initially obtained from the KOMP Repository. 
SILAC was performed using HeLa cells extracts grown in either normal amino acids 
culture condition (‘light’) or using modified amino acids culture condition (‘heavy’). 
Full methods and data analysis are described in detail in Methods. 


©2014 Macmillan Publishers Limited. All rights reserved 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 


Received 25 September 2013; accepted 11 April 2014. 
Published online 21 May; corrected online 11 June 2014 (see full-text HTML 
version for details). 


1. Helin, K. & Dhanak, D. Chromatin proteins and modifications as drug targets. 
Nature 502, 480-488 (2013). 

2. Watanabe, T. et al. Differential gene expression signatures between colorectal 
cancers with and without KRAS mutations: crosstalk between the KRAS pathway 
and other signalling pathways. Eur. J. Cancer 47, 1946-1954 (2011). 

3. Gaedcke, J. et al. Mutated KRAS results in overexpression of DUSP4, a MAP-kinase 
phosphatase, and SMYD3, a histone methyltransferase, in rectal carcinomas. 
Genes Chromosom. Cancer 49, 1024-1034 (2010). 

4. Hamamoto, R. et al. SMYD3 encodes a histone methyltransferase involved in the 
proliferation of cancer cells. Nature Cell Biol. 6, 731-740 (2004). 

5. Pylayeva-Gupta, Y., Grabocka, E. & Bar-Sagi, D. RAS oncogenes: weaving a 
tumorigenic web. Nature Rev. Cancer 11, 761-774 (2011). 

6. VanAller,G.S. etal. Smyd3 regulates cancer cell phenotypes and catalyzes histone 
H4 lysine 5 methylation. Epigenetics 7, 340-343 (2012). 

7. Jackson, E. L. et a/. Analysis of lung tumor initiation and progression using 
conditional expression of oncogenic K-ras. Genes Dev. 15, 3243-3248 
(2001). 

8. Hingorani, S. R. et al. Preinvasive and invasive ductal pancreatic cancer and its 
early detection in the mouse. Cancer Cell 4, 437-450 (2003). 

9. Zhu, L, Shi, G., Schmidt, C. M., Hruban, R. H. & Konieczny, S. F. Acinar cells 
contribute to the molecular heterogeneity of pancreatic intraepithelial neoplasia. 
Am. J. Pathol. 171, 263-273 (2007). 

0. Guerra, C. etal. Chronic pancreatitis is essential for induction of pancreatic ductal 
adenocarcinoma by K-Ras oncogenes in adult mice. Cancer Cel! 11, 291-302 
(2007). 

1. Means, A. L. et al. Pancreatic epithelial plasticity mediated by acinar cell 

transdifferentiation and generation of nestin-positive intermediates. Development 

132, 3767-3776 (2005). 

2. Morris, J.P.t., Cano, D.A., Sekine, S.,Wang, S.C. & Hebrok, M. B-catenin blocks Kras- 

dependent reprogramming of acini into pancreatic cancer precursor lesions in 

mice. J. Clin. Invest. 120, 508-520 (2010). 

3. Bardeesy, N. et al. Both p16'"** and the p194"-p53 pathway constrain 

progression of pancreatic adenocarcinoma in the mouse. Proc. Nat! Acad. Sci. USA 

103, 5947-5952 (2006). 

4. Feldser, D. M. etal. Stage-specific sensitivity to p53 restoration during lung cancer 

progression. Nature 468, 572-575 (2010). 

5. Junttila, M. R. et a/. Selective activation of p53-mediated tumour suppression in 

high-grade tumours. Nature 468, 567-571 (2010). 

6. Johnson, L. etal. Somatic activation of the K-ras oncogene causes early onset lung 

cancer in mice. Nature 410, 1111-1116 (2001). 

7. Levy, D. et al. A proteomic approach for the identification of novel lysine 
methyltransferase substrates. Epigenetics Chromatin 4, 19 (2011). 


LETTER 


18. Maruyama, T. et al. CHIP-dependent termination of MEKK2 regulates temporal 
ERK activation required for proper hyperosmotic response. EMBO J. 29, 
2501-2514 (2010). 

19. Sun, W. et al. MEK kinase 2 and the adaptor protein Lad regulate extracellular 
signal-regulated kinase 5 activation by epidermal growth factor via Src. Mol. Cell. 
Biol. 23, 2298-2308 (2003). 

20. Fanger,G.R., Johnson, N.L.& Johnson, G. L. MEK kinases are regulated by EGF and 
selectively interact with Rac/Cdc42. EMBO J. 16, 4961-4972 (1997). 

21. Choi, M. C. et al. A direct HDAC4-MAP kinase crosstalk activates muscle atrophy 
program. Mol. Cell 47, 122-132 (2012). 

22. Matitau, A. E., Gabor, T. V., Gill, R. M. & Scheid, M. P. MEKK2 kinase association with 
14-3-3 protein regulates activation of c-Jun N-terminal kinase. J. Biol. Chem. 288, 
28293-28302 (2013). 

23. Enomoto, A. et al. Negative regulation of MEKK1/2 signaling by serine-threonine 
kinase 38 (STK38). Oncogene 27, 1930-1938 (2008). 

24. Brunet, A., Pages, G. & Pouyssegur, J. Constitutively active mutants of MAP kinase 
kinase (MEK1) induce growth factor-relaxation and oncogenicity when expressed 
in fibroblasts. Oncogene 9, 3379-3387 (1994). 

25. Raman, M.,Chen, W. & Cobb, M. H. Differential regulation and properties of MAPKs. 
Oncogene 26, 3100-3112 (2007). 

26. Eichhorn, P. J., Creyghton, M. P. & Bernards, R. Protein phosphatase 2A regulatory 
subunits and cancer. Biochim. Biophys. Acta 1795, 1-15 (2009). 

27. Li, Y. M. & Casida, J. E. Cantharidin-binding protein: identification as protein 
phosphatase 2A. Proc. Nat! Acad. Sci. USA 89, 11867-11870 (1992). 


Acknowledgements We thank members of the Gozani and Sage laboratories for 
critical reading of the manuscript, A. Smits for help with mass spectrometry data 
visualization and PJ. Utz and the Floren Family Trust for providing ProtoArrays. This 
work was supported in part by grants from the NIH to O.G. and J.S. (RO1 CA172560) 
and an NIH Innovator grant (DP2 OD007447) from the Office of the Director for B.A.G.; 
M.V. was supported by a grant from NWO-VIDI, P.K.M. was supported by the 
Tobacco-Related Disease Research Program, a Dean’s Fellowship from Stanford 
University, and the Child Health Research Institute and Lucile Packard Foundation for 
Children’s Health at Stanford. N.R. was supported by a grant from the Fondation pourla 
Recherche Médicale. J.S. is the Harriet and Mary Zelencik Scientist in Children’s Cancer 
and Blood Diseases. 


Author Contributions N.R. and P.K.M. contributed equally to this work and are listed 
alphabetically. They were responsible for the experimental design, execution, data 
analysis and manuscript preparation. P.K. and AJ.B. performed the bioinformatics 
meta-analysis. P.W.T.C.J. and M.V. performed the SILAC experiments. S.L. and B.A.G. 
performed the methylated peptide mass spectrometry experiments. A.W.W. generated 
recombinant H3 and H3KAR protein. 0.B., G.S.V., M.H., D.D., P.J.T.and R.G.K. generated 
SMYD3 and MAP3K2me antibodies, the MAP3K2 peptides, and determined the 
catalytic efficiency of SMYD3. 0.G. and J.S. were equally responsible for supervision of 
research, data interpretation and manuscript preparation. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare competing financial interests: details 
are available in the online version of the paper. Readers are welcome to comment on 
the online version of the paper. Correspondence and requests for materials should be 
addressed to J.S. (julsage@stanford.edu) and O.G. (ogozani@stanford.edu). 


12 JUNE 2014 | VOL 510 | NATURE | 287 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 

Ethics statement. Mice were maintained according to practices prescribed by the 
NIH at Stanford’s Research Animal Facility accredited by the Association for Assess- 
ment and Accreditation of Laboratory Animal Care. 

Mouse strains. Kras*/!"6!?, 953!°*°*, and p48*/“’* mice have been described 
before®*?. Smyd3°™ KOMP) wis mice were obtained from the KOMP Repository. 
Details on the targeted allele are available on the KOMP website®. Briefly, mice 
were constructed using the ‘knockout first’ strategy. In this allele, insertion ofa LacZ 
cassette with strong splice acceptor in intron 2 of the Smyd3 gene creates a knock- 
out allele serving additionally as a reporter. Expression of the Cre recombinase in 
cells removes the LacZ cassette and further deletes several Smyd3 exons, resulting 
in a null allele (Extended data Fig. 2d). Mice were of mixed C57BL/6;129SV back- 
ground and we systematically used randomly picked littermates as controls in all 
the experiments (sex ratio per cohort balanced). 

Pancreatic cancer mouse models. Pancreatitis-induced tumorigenesis. Acute pan- 
creatitis was induced at 6 to 8 weeks of age in pase” *:Kras'S46!2P (Kras) and pase’; 
Kras*!??. Smyd3'* (Kras;Smyd3) mice by administration of 8 hourly intraper- 
itoneal injections of caerulein (125 ug per kg body weight), (Sigma-Aldrich) over 2 
consecutive days as described previously’*. Mice were treated as indicated with the 
PP2A inhibitor cantharidin (0.15 mgkg’ * BID, IP), (Sigma-Aldrich) or the MEK 
inhibitor trametinib (Selleckchem) (1 mg kg‘ or 0.1 mg kg * intraperitoneally daily) 
or vehicle 10% cyclodextran. Pancreatic lesions were analysed 7 days after the last 
injection. 

Spontaneous model of pancreatic intraepithelial neoplasia (PanIN) develop- 
ment. PanIN progression was analysed in p48" * ;Kras““!?P (Kras) and p48@*; 
Kras’!62P. Sm yd 31%" (Kras;Smyd3) mice aged for 6 months. Quantification of 
low (PanIN1la ee 1b) and high-grade (PanIN2 and 3) lesions was performed. His- 
topathological analysis was conducted on de-identified slides based on the clas- 
sification consensus”. Five images ( 100) were taken in standardized positions (as 
to cover the whole section) for each slide. PanINs were counted from 8 independ- 
ent animals for each group. Error bars represent s.e.m. 

Model of aggressive PDAC. To study aggressive PDAC expansion, we generated 
pase"*; Kras 1 G12? ya giowiox (Kras;p53) and p4s@"*; Kras’ G12 ya giowlox, Smyt zotox 
(Kras;p53;Smyd3) mutant mice. Mice were followed for signs of disease progression. 
At endpoint, tumours were processed for histological and immunohistochemical 
evaluation. To calculate relative normal acini area Kras;p53 and Kras;p53;Smyd3 
tumour sections were stained for amylase. Positive regions on six random, non- 
overlapping, X 100 images were collected from 3 mice per genotype. For each image 
positive amylase area was normalized to total pancreas tissue area using Image] soft- 
ware. Error bars represent means + standard error of the mean (s.e.m.). 

Lung cancer mouse models 

Adenovirus-induced lung adenocarcinoma (LAC). Kras'*"@P (Kras) and 
Kras'S@!P-Smyd3"°** (Kras;Smyd3) mice were treated with 5 X 10° plaque- 
forming units of adenovirus expressing Cre (University of Iowa adenovirus core) 
by intratracheal infection as previously described*’. Tumours were analysed and 
quantified at 12, 16 and 20 weeks post-infection, n = 6 for each group. 
Lentivirus-induced lung adenocarcinoma. Generation of a dual promoter len- 
tiviral vector for Cre and cDNA expression was described before*’. To determine 
the effect of reconstitution of exogenous Smyd3"”” and Smyd3"'*** expression on 
lung tumour progression a lentiviral vector was developed that expressed both the 
Smyd3 complementary DNA and Cre. A lentivirus expressing Cre alone was used 
as a control. Virus was produced and titred as described previously”. Briefly, the 
lenti-Cre vector was co-transfected with packaging vectors into 293T cells using 
calcium-phosphate. The supernatant was collected at 48 and 72h. Concentrated 
virus was recovered by ultracentrifugation at 25,000r.p.m. for 90 min and re- 
suspended in PBS. Cohorts of Kras’S!-¢!?; Smyd3!°""* (Kras;Smyd3) mice were 
infected with each lentiviral vector. Tumour burden was analysed 24 weeks after 
lentiviral infection, n = 4 for each treatment. 

Preparation of pancreatic epithelial explants culture. Pancreatic epithelial ex- 
plants from 4- to 6-week-old p48"”* (WT) and p48°"* ;Smyd3'*"* (Smyd3) were 
established by modification of previously published protocols™. In brief, the whole 
pancreas was collected and treated twice with 1.2 mg ml * collagenase VIII (Sigma- 
Aldrich). Following multiple wash steps with McCoy’s medium containing soybean 
trypsin inhibitor (SBTI, 0.2 mg ml), digested samples were filtered through a 100-1m 
filter, resuspended in culture medium (Waymouth’s MB 752/1 supplemented 
with 0.1% BSA, 0.2 mg ml 'SBTI;50 ug ml” "bovine pituitary extract, 10 pg ml~ 
insulin, 5 pg ml! transferrin, 6.7 ng ml! selenium in 30% FCS) and allowed to 
recover for 1 h at 37 °C. Thereafter, cells were pelleted and resuspended in culture 
medium supplemented with penicillin G (1,000 U ml’), streptomycin (100 pg ml’), 
amphotericin B (0.25 pg ml‘), 0.1% FCS, and an equal volume of rat tail collagen 
type I (BD Bioscience). The cellular/rat tail collagen type I suspension was imme- 
diately plated on plates pre-coated with 2.5 mg ml! of rat tail collagen type I. In 
stimulation experiments recombinant human EGF (rhEGF, Invitrogen) was added 


at final concentration of 25 ng ml’. For quantification, acinar explants were seeded 
in triplicates. Cells clusters were counted from at least 3 optical fields/well and 
reported as a percentage of acinar clusters and duct-like spheres. The quantifica- 
tion was performed in two independent experiments; the number of mice is reported 
in the main text. 

Lentiviral transduction of pancreatic epithelial explants. For reconstitution ex- 
periments fresh explants from Kras'““"??:Smyd3'""““" (Smyd3 deficient) were 
transduced using lentiviral vectors with or without Cre and expressing wild-type 
SMYD3 or SMYD3(F183A). Cre-mediated recombination results in oncogenic 
K-Ras activation, causing spontaneous ADM (acinar to ductal metaplasia), which 
was quantified as described above. 

Immunofluorescence of pancreatic epithelial explants. For immunofluorescent 
labelling of explanted pancreatic acini/duct clusters, collagen gels containing ex- 
planted pancreas were fixed in the chamber slides in 4:1 methanol:DMSO over- 
night at 4 °C, washed and stored at -20 °C in 100% methanol. Collagen gels were 
permeabilized with TritonX-100 0.1% (Sigma-Aldrich) for 5 min at room temper- 
ature and washed in PBS+TritonX-100 0.025% (PBST). Gels were blocked with 
5% normal goat serum in PBST+BSA 2% for 2h at room temperature, then incu- 
bated sequentially with SMYD3 primary antibody (Abcam) and Alexa Fluor 488- 
conjugated secondary antibody diluted in blocking buffer overnight at 4 °C. Following 
each antibody, gels were washed in PBST. Cells were counterstained with 50 ig ml 
DAPI (Invitrogen) and washed in PBS. Images were captured on a Zeiss inverse 
fluorescent microscope. Identical acquisition methods were used for all samples to 
allow direct comparison of the resulting images. 

Histology, immunohistochemistry and X-gal staining. Tissue specimens were 
fixed in 4% buffered formalin for 24h and stored in 70% ethanol until paraffin 
embedding. 3-,1m sections were stained with haematoxylin and eosin (HE) or used 
for immunohistochemical studies. 

Immunohistochemistry. Immunohistochemistry was performed on formalin- 
fixed, paraffin embedded mouse and human tissue sections using a biotin-avidin 
method as described before’. The following antibodies were used: rabbit anti- Amylase 
(Sigma-Aldrich), pERK1/2 (Cell Signaling), MUC5 (NeoMarkers) and SMYD3 
(Abcam). Sections were developed with DAB and counterstained with haematox- 
ylin. Pictures were taken using a Zeiss microscope equipped with the AxioVision 
software. Analysis of the tumour area and IHC analysis was done using Image] 
software by measuring pixel units. 

X-gal staining. Staining of cryosections (8 |1m) was carried out as described 
previously*’; slides were counterstained with nuclear fast red. 

qRT-PCR. RNA was isolated using the Qiagen RNeasy Isolation Kit followed by 
cDNA synthesis (SuperScript II, Invitrogen). Real-time PCR was performed with 
800 nM primers diluted in a final volume of 20 pl in SYBR Green Reaction Mix 
(Applied Biosystems). RT-PCRs were performed as follows: 95 °C for 10 min, 35 
cycles of 95 °C for 15s and 60°C for 1 min. qRT-PCR data are representative of 
4 independent mouse pancreatic epithelial explants isolations per treatment. All 
samples were analysed in triplicate. Gapdh expression was used for normalization. 
The following primers were used: 

Smyd3-For 5'-TGCGCACCATGGAGCCGTAC 

Smyd3-Rev 5'-GTCAAAGGCCAGCCTCAGGTTCT 

Gapdh-For 5'-CCCACTAACATCAAATGGGG 

Gapdh-Rev 5'-CCTTCCACAATGCCAAAGTT 

Meta-analysis of public PDAC and NSCLC data sets. We downloaded raw data 
for gene expression studies (7 pancreatic cancer, 6 NSCLC) from the NCBI GEO 
and EBI ArrayExpress. After re-annotating the probes, each data set was normal- 
ized separately using g-RMA. We applied two meta-analyses approaches to the 
normalized data. The meta-analysis approach has been recently described”. Briefly, 
the first approach combines effect sizes from each data set into a meta-effect size to 
estimate the amount of change in expression across all data sets. For each gene in 
each data set, an effect size was computed using Hedges’ adjusted g. If multiple 
probes mapped to a gene, the effect size for each gene was summarized using the 
fixed effect inverse-variance model. We combined study-specific effect sizes to obtain 
the pooled effect size and its standard error using the random effects inverse-variance 
technique. We computed z-statistics as a ratio of the pooled effect size to its standard 
error for each gene, and compared the result to a standard normal distribution to 
obtain a nominal P value. P values were corrected for multiple hypotheses testing 
using false discovery rate (FDR)’’. We used a second non-parametric meta-analysis 
that combines P values from individual experiments to identify genes with a large 
effect size in all data sets. Briefly, we calculated a t-statistic for each gene in each study. 
After computing one-tail P values for each gene, they were corrected for multiple 
hypotheses using FDR. Next, we used Fisher’s sum of logs method, which sums the 
logarithm of corrected P values across all data sets for each gene, and compares the 
sum against a chi-square distribution with 2k degrees of freedom, where k is the num- 
ber of data sets used in the analysis. 
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Plasmids. Bacterial expression plasmids were created using pGEX-6P1 vector. Tran- 
sient mammalian expression plasmids were created using pCDNA3.1 HA, pCDNA3.1 
Myc and pCagFlag vectors. The different inserts were amplified by PCR using either 
cDNA or specific clones from the human ORFeome library as template. Single- 
point mutations of SMYD3 and MAP3K2 were generated using the QuikChange 
site-directed mutagenesis protocol (Stratagene), and clones were confirmed by DNA 
sequencing. SMYD3 and MAP3K2 shRNA targeting UTR regions were cloned in 
a pSICOR vector carrying a puromycin resistance gene. 

Human SMYD3 shRNA sequence directed against the 3’ UTR TGCGTGTG 
TCTTTGTTGAATTTCAAGAGAATTCAACAAAGACACACGCTTTTTTC. 

Human MAP3K2 shRNA sequence directed against the 3’ UTR TGGATGA 
TTTCACTAGGCATTTCAAGAGAATGCCTAGTGAAATCATCCTTTTTTC. 

Mouse Smyd3 shRNA sequence directed against the 3’ UTR TGAGCAGAACC 
ATTACAATATTCAAGAGATATTGTAATGGTTCTGCTCTTTTTTC. 

Mouse Map3k2 shRNA sequence directed against the 3’ UTR TAGTCATAGC 
TATAGTGAAATTCAAGAGATTTCACTATAGCTATGACTTTTTTTC. SMYD3 
and MAP3K2 stable reconstitution plasmids have been created using the Gateway 
cloning system according to the manufacturer’s instructions (Invitrogen) with either 
the WT or point mutant constructs, into pMSCV-Flag and pBabe-Flag vectors (hygro- 
mycin resistance). 

Cell culture, reagents and transfections. K-Ras mutant lung and pancreatic car- 
cinoma lines LKR10 (mouse lung), A549 (human lung) and CFPacl (human pancreas) 
were used (all these cell lines are wild-type for EGER, see below and the COSMIC 
database for the human cell lines). 293T, LKR10, A549 and CFPacl cells were grown 
in Dulbecco’s modified Eagle’s medium (GIBCO) supplemented with 10% fetal calf 
serum (FCS, GIBCO), 100 units ml” penicillin/streptomycin and glutamine. All 
cells were cultured at 37 °C in a humidified incubator with 5% COs. For transient 
expression, cells were transfected using Mirus 293T transfection reagent and col- 
lected 24 to 36h later. For stable knockdown, cells were transduced with lentiviral 
shRNA constructs using the packaging vectors pGagpol and pA8.2, followed by 
2 ug ml‘ puromycin selection for one week. For rescue experiments, cells were trans- 
duced with retroviral pBabe and pMSCV constructs using packaging the pGag 
and pVSV¢ vectors, followed by 100 jg ml ' hygromycin selection for one week. 

Serum and EGF stimulation were performed after 48 h of serum-starvation using 
either regular 10% FBS cell media or 25 ng ul * of rhEGF (Promega) for 15 min. 
EGER sequencing. LKR10 DNA was isolated. Egfr exons 18 to 21 were amplified 
using a proofreading polymerase (PfuUltra, Invitrogen) and sequenced bi-directionally. 
Sequence was verified based on transcript ID: ENSMUST00000020329. The fol- 
lowing primers were used: 
mEGER-Exon 18: For-5'CTCTGGCTCAGAATGAATCTAC, Rev-5’'GAAGCCT 
AGTGCGGACCTGTC, product: 268 bp. 
mEGER Exon 19: For-5'CCAGCTCACAAGGCAACATG, Rev-5'CTAAGGAAG 
CAAGATTGACC, product: 229 bp. 
mEGER Exon 20: For-5'GATTCATCTATTGTCCTTACC, Rev-5'TGGGTACT 
TCAGTGGACAGAC, product: 234 bp. 
mEGER Exon 21: For-5'CATGACACTGAGGATGCCCAGA, Rev-5’CAAATGC 
TGCCCACAGCTGAC, product: 298 bp. 

Cell extracts, immunoprecipitation and immunoblot analysis. For total cell ex- 
tracts, cells were lysed in RIPA buffer (10 mM Tris-HCl pH 8, 140 mM NaCl, 1 mM 
EDTA, 0.5mM EGTA, 1% Triton, 0.1% SDS, 1mM PMSF, protease inhibitors 
(Roche) and a phosphatase inhibitor cocktail (Sigma-Aldrich)) for 15 min. Cell 
fractionation was performed by collecting supernatant (cytoplasmic fraction) after 
5 min of 1,300g centrifugation following a 10 min incubation in hypotonic buffer 
(10 mM HEPES pH 7.9, 10 mM KCl, 1.5 mM MgCh, 0.34 M sucrose, 10% glycerol, 
1mM DTT, 0.05% Triton, and protease inhibitors). The pellet was then incubated 
15 min in LSDB250 buffer (glycerol 20%, MgCl, 3mM, HEPES pH7.9 50 mM, 
KCI 250 mM, DTT 0.5mM, PMSF 0.5 mM, NP40 0,1%, protease inhibitors), and 
centrifuge at 15,000g for 10 min. The supernatant was collected as soluble nuclear 
extract, and the pellet was further extracted in LSDB250 buffer with sonication (chro- 
matin fraction). Protein concentration was determined by the BCA assay (Pierce). 

For immunoprecipitation, cells were lysed in either LSDB250 buffer for smooth 
total cell extract, or hypotonic buffer for cytoplasmic extract, and same amount of 
protein extracts were incubated with specific antibody overnight at 4 °C. Extracts 
were then incubated with protein A Sepharose beads (GE Healthcare) for 2h at 
4°C. 

Proteins were resolved by SDS-PAGE, transferred to nitrocellulose membrane and 
analysed by immunoblot. Antibodies used were as follows: SMYD3 and MAP3K2me2/3 
(generated by Yenzym); GST (generated by Covance); beta-tubulin (05-661 Millipore); 
MAP3K2 (1662-1 Epitomics); Flag and ERK5 (F1804, E1523, Sigma); pERK1/2 and 
ERK1/2 and pMEK1/2 and MEK1/2 and pJNK and JNK and pP38 and P38 and 
pERKS and pAKT and AKT and pCRafand CRaf, PPP2R2a (4370, 4695, 9121, 9122, 
9251, 9252, 9211, 9212, 3371, 2965, 4685, 9427, 9422, 5689, Cell Signaling); HA 
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and Myc (26183, 21316 Pierce). Immunoblots signal intensity was measured using 
ImageJ software. Quantification data are expressed as s.e.m. 
Expression and purification of recombinant proteins. For expression of GST 
tagged recombinant proteins, transformed BL21 cells were induced with 0.1 mM 
IPTG overnight at 20 °C, and proteins were purified using glutathione Sepharose 
beads (GE Healthcare) and eluted in 10 mM reduced glutathione (Sigma) or cleaved 
from the GST tag using purified Precision enzyme. Recombinant human histone 
H3 and H3K4* mutant (first thirteen lysines mutated to arginine except lysine K4) 
were expressed and purified as previously described*. 
ProtoArray, methylation and kinase assays. In vitro methylation assays were 
performed using 1 to 2 11g of recombinant proteins or peptides incubated with 1 1g 
of recombinant methyltransferases and 0.1 mM of S-adenosyl-methionine (SAM, 
Sigma) or 2 Ci 3H-AdoMet (American Radiolabelled Chemicals) in buffer con- 
taining 50 mM Tris-HCl (pH 8.0), 10% glycerol, 20 mM KCl, 5mM MgCh, and 
1mM PMSF at 30 °C overnight. The reaction mixture was resolved by SDS-PAGE, 
followed by autoradiography, Coomassie stain (Pierce) or mass spectrometry analysis. 
Kinetic constants k-a and Ky were determined using Grafit (Erithacus software) 
from SMYD3 methylation activity on histone H4 (EMD Millipore, USA) and MAP3K2 
(Origene, USA) was assessed by radiometric assays using [7H]SAM with specific 
product capture on arginine binding SPA beads for assays using histone substrates 
or RNA binding SPA beads when MAP3K2 was the substrate. (PerkinElmer, USA). 
Assay conditions were 20 mM Tris pH 8,3 mM DTT, 50 uM ZnCl, 0.005% Tween- 
20, 1.2 uM unlabelled SAM, 0.2 uM [7H]SAM and 25 nM SMYD3 final. Reactions 
were quenched using 2 mM unlabelled SAM. SPA signal was quantified in a Microbeta 
scintillation counter (PerkinElmer, USA). Data were fit to the Michaelis-Menten 
equation’, where the rate was plotted as a function of the concentration of substrate. 
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To verify the activity of the other lysine methyltransferases tested, in vitro methyla- 
tion assays were performed using known substrates, as previously reported?*. 

In vitro kinase assays were performed by incubating 1 jig of recombinant MAP3K2 
WT or mutants with 1 1g of recombinant MEK1 in kinase buffer containing 25 mM 
Tris-HCl (pH 7.5), 5 mM -glycerophosphate, 2 mM dithiothreitol (DTT), 0.1 mM 
Na3VO4, 10 mM MgCl, and 200 uM ATP (Cell Signaling) at 37 °C during 30 min. 
Peptide pull-down and SILAC. MAP3K2 peptides were generated by 21st Century 
Biomedicals and are based on the following sequence: DY DNPIFEKFGKGGTYP 
RRYHVSYHG[K-Biot-Ahx]-amide. 

For peptide pull-downs, 5 to 10 ll of streptavidin Sepharose beads (GE Healthcare) 
were saturated with 10 1g of specific biotinylated peptides for 2h at 4°C under 
rotation in peptide buffer (50 mM Tris pH 7.5, 150mM NaCl, 1% NP-40), then 
washed 3 times in the same buffer. Beads were then incubated with either 1 pg 
of recombinant proteins or 1 mg of HeLa cytoplasmic extract for 4h at 4 °C under 
rotation in peptide buffer. Beads were then washed 3 times in the same buffer and 
resuspended in Laemmli buffer for immunoblot analysis. 

For SILAC peptide pull-down, HeLa cytoplasmic extracts were prepared from 
cells cultivated in either normal amino acids culture condition (‘light’) or using mod- 
ified amino acids culture condition (‘heavy’). A 2-way experiments was performed, 
the ‘forward’ condition combining MAP3K2-K260me0 peptide with light extract 
and MAP3K2-K260me3 peptide with heavy extract, the ‘reverse’ condition com- 
bining MAP3K2-K260me3 peptide with light extract and MAP3K2-K260me0 pep- 
tide with heavy extract. Beads of each pair of peptide pull-down were then pooled 
together, washed and extracts were resuspended in Laemmli buffer and resolved by 
SDS-PAGE. In-gel trypsin digestion was performed and peptides purified using 
C18 stage tips (Fisher) before mass spectrometry analysis to quantify the ratio of 
each potential binders for K260me0 and K260me3 peptides, in forward and reverse 
condition. To identify outliers in both the forward and the reverse experiment, 
boxplot statistics was applied (cut off = 1.5X interquartile range). Proteins iden- 
tified as outlier in both experiments are assigned as significant interactors. Amino 
acid complements used for SILAC are L-lysine-2HCl (Thermo Scientific 88429), 
L-arginine-HCl (Thermo Scientific 88427), *H,-L-lysine-2HCl (Thermo Scientific 
88438), 8C,-L-arginine-HCl (Thermo Scientific 88433), L-proline (Thermo Scientific 
88430). 

Active Ras pull-down and detection. Ras activity refers to the level of guanosine 
triphosphate-bound Ras, which is able to bind Ras binding domain (RBD) of RAF-1 
as measured using a RBD-domain pull-down assay kit as recommended by the 
manufacturer (The Active Ras Pull-Down and Detection Kit, Thermo Scientific). 
Briefly, tumour biopsies were homogenized on ice in lysis buffer containing 25 mM 
4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (pH 7.5), 1% Igepal CA-630, 
150 mM NaCl, 0.25% sodium deoxycholate, 10% glycerol, 25 mM NaF, 10 mM MgCl, 
1mM EDTA, 10 pg ml’ aprotinin, 10 pg ml ' leupeptin and 1 mM sodium ortho- 
vanadate. These samples were sonicated and centrifuged at 15,000g for 10 min at 
4 °C to remove cellular debris. Protein concentration was measured. Equal amounts 
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of lysate were incubated for 30 min at 4 °C with agarose beads coated with RBD. The 
beads were then washed three times with ice-cold lysis buffer, boiled for 5 min at 
95 °C, and active Ras was analysed by immunoblotting following standard pro- 
tocol using Ras-specific antibodies (Thermo Scientific). For comparison to total 
Ras protein, 2% of total lysates used for pull-down was analysed by immunoblot. 
Cancer xenografts. For xenograft analysis, 500,000 CFPac] cells were injected into 
the flank of NSG mice with Matrigel (BD Bioscience). Tumour volume was mea- 
sured at the times indicated and calculated using the ellipsoid formula (length x 
width’). 

Cell assays. Anchorage-independent growth was assessed in soft agar assays. Cell 
proliferation was assessed by counting cell number at indicated days and expressed 
relative to the control as previously described’. Cell viability in response to treat- 
ment with the MEK inhibitor trametinib (Selleckchem) was measured by an MTT 
assay (Roche) according to the manufacturer’s instructions. 

Statistics. Kaplan-Meier survival curves were calculated using the survival time for 
each mouse from all littermate groups. The log-rank test was used to test for sig- 
nificant differences between the groups. For image quantification and gene expres- 
sion analysis statistical significance was assayed by Student’s t-test with the Prism 
GraphPad software (two-tailed unpaired and paired t-test depending on the exper- 
iment; variance was first systematically examined using an F-test). *P value < 0.05; 
**P value < 0.01; ***P value < 0.001; ns, not significant. Data are represented as 
mean + standard error of the mean (s.e.m.). 
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Extended Data Figure 1 | SMYD3 is a highly overexpressed KMT in 
Ras-associated cancers. a, Analysis of seven publically available human 


pancreatic ductal adenocarcinoma (PDAC) gene expression studies from the 


NCBI GEO and EBI ArrayExpress for SMYD3 levels. The red line indicates 
expression of SMYD3 in pancreatic cancer biopsies (n = 203); the blue line 


marks normal pancreas samples (n = 91). The scale shows relative expression 


levels (logy). b, A bioinformatics meta-analysis identified 5 lysine 
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methyltransferase overexpressed in human pancreatic ductal adenocarcinoma 
(PDAC). Meta-effect size and statistical tools are described in the Methods. 
FDR, false discovery rate. c, d, Summary of SMYD3 expression levels in seven 
(n = 294 independent samples) publicly-available expression data sets of 
PDAC and six data sets (n = 319 tumours and n = 147 normal independent 
samples) of non-small cell lung cancer (NSCLC), respectively. Detailed 
statistical description in the Methods section. 
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Extended Data Figure 2 | Analysis of SMYD3 expression in human and 
mouse PDAC and lung adenocarcinoma (LAC). a, Immunohistochemical 
analysis of SMYD3 expression in mouse and human WT pancreas, PanIN 
lesions, and PDAC. The expression pattern was further analysed using a 
Smyd3'“ reporter knock-in strain. Smyd3'*~ mice were crossed to 
p48;Kras@'? (Kras) mice and studied at progressing stages of disease. Analysis 
of LacZ activity by X-gal staining as a surrogate for Smyd3 expression is shown 
(lower panel) (see Extended Data Fig. 3 for a cartoon of the knock-in allele). 
b, Immunoblot analysis with the indicated antibodies on tumour biopsy lysates 
from wild-type pancreas and from the pancreas of Kras mutant mice at 4.5 
and 9 months of age when mice develop PanIN and PDAC, respectively (each 


time point represents two biological replicates). c, IHC analysis of SMYD3 
expression in normal lung, atypical adenomatous hyperplasia (AAH), and lung 
adenocarcinoma (LAC). X-gal analysis of LacZ activity in Kras-driven tumours 
with the Smyd3'*~ reporter strain (lower panel). All images shown are 
representative. Arrowheads indicate nuclear localization of SMYD3. Scale bars, 
50 pum. d, Smyd3 knockout allele diagram. In this allele, insertion of a LacZ 
cassette with a strong splice acceptor in intron 2 of the Smyd3 gene creates a 
mutant allele serving additionally as a reporter (Smyd3"“). Expression of the 
Cre recombinase in cells removes the LacZ cassette and further deletes Smyd3 
exon 2, resulting in a null allele Smyd3®°. SA, splice acceptor; pA, 
polyadenylation signal. 
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Extended Data Figure 3 | Smyd3 deletion inhibits pancreatic 
tumorigenesis. a, Analysis of pancreatic tumorigenesis at 6 months in Kras 
and Kras;Smyd3 mutant mice. Representative serial histology section (HE), 
IHC for pERK1/2 and the PanIN marker MUCS. b, Pancreatic cancer 
phenotypes in Kras;p53 and Kras;p53;Smyd3 mutant mice. Representative IHC 
for pERK1/2. Arrowheads indicate areas with intact acinar cells. 

c, Quantification of intact normal acinar area (amylase-positive area) in 
Kras;p53 and Kras;p53;Smyd3 mutant mice. Data are represented as 

mean + s.e.m. ***P value < 0.001 (two-tailed unpaired Student’s t-test). 

d, Representative HE and pERK1/2 IHC images of lung sections from Kras 
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Act. RAS a a? 
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and Kras;Smyd3 mutant mice 12, 16 and 20 weeks after Ad-Cre infection. 
pERK1/2 is a marker of Ras activity and advanced tumours. e, f, IHC analysis 
of SMYD3 expression in the PDAC (e) and LAC (h) mouse models. 
Arrowheads indicate cytoplasmic localization of SMYD3. Scale bars, 50 pm. 
g, h, Immunoblot analysis with the indicated antibodies probing pancreatic 
adenocarcinoma (g) or lung adenocarcinoma tumour lysates (h) dissected from 
Kras and Kras;Smyd3 mutant mice. Active Ras corresponds to Ras protein in 
the GTP-bound state pulled down with the RAF Ras-binding domain (RBD) 
(see Methods). *Tubulin loading control as in Fig. 1j and 2f, respectively. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


LKR10 


500 4 -ShControl 
-shSMYD3 


relative proliferation 
a 
8 


Proliferation 3 


relative proliferation 


A549 


00 Proliferation 


-shControl 
-shSMYD3 


CFPact1 


Proliferation 


-shControl 


Extended Data Figure 4 | SMYD3 functions to maintain the tumorigenic 
characteristics of human and murine cancer cells. a-c, Cell proliferation rates 
(top panels) and colony formation in soft agar assays (bottom panels) of murine 
LAC cell line LKR10 (a), human LAC cell line A549 (b), or human PDAC 
cell line CFPacl (c) with or without SMYD3 depletion by stable shRNA 
(respective immunoblot in middle panels). d-f, SMYD3 depletion in CFPacl 
attenuates tumour growth in mouse xenografts. d, Macroscopic picture of 
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the experiment. Scale bar, 1 cm. e, Volume analysis shows that shSMYD3 
significantly inhibits the expansion of pancreatic tumours (n = 6 for each 
group). f, HE of the tumours and IHC confirmation of SMYD3 expression 
and knock-down. All scale bars, 50 um. *P value < 0.05; **P value < 0.01; 
***P value < 0.001 (two-tailed unpaired Student’s t-test). Data are represented 
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Extended Data Figure 5 | Lentiviral reconstitution of SMYD3 in pancreatic © SMYD3(F183A) undergo ADM and form ducts (arrowhead) ex vivo. 
acinar-to-ductal-metaplasia (ADM) assays and in lung cancer cells in vivo. _c, Quantification of acinar and ductal clusters after lentiviral infection (each 


a, IHC analysis of SMYD3 reconstitution in the lung (from Fig. 3a). treatment represents four independent biological replicates). Data are 
b, Immunofluorescent detection of SMYD3 expression in wild-type and represented as mean ~ s.e.m. *P value < 0.05 (two-tailed unpaired Student’s 
transduced acinar clusters (left panel). Acini (asterisk) transduced with t-test). 
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Extended Data Figure 6 | SMYD3 specifically methylates MAP3K2 at lysine 
260 in vitro. a, SMYD3 methylates MAP3K2 on protein arrays. Representative 
image (n = 3 independent experiments) showing a SMYD3 methylation 
assay on a ProtoArray. The close-up shows the two independent MAP3K2 
spots on the array being methylated. b, SMYD3 is detected in the cytoplasm and 
not the nucleus in LKR10 cells. Immunoblot analysis with the indicated 
antibodies of LKR10 cell lysates biochemically separated into cytoplasmic, 
nuclear and chromatin fractions (see Methods). c, SMYD3 catalytic efficiency is 
two orders of magnitude greater on MAP3K2 than on H4. keat, Ky, and kear/ Ku 
values of SMYD3 activity on recombinant H4 and MAP3K2 as substrates 

are shown. d, Schematic of the H3K4* mutant form used in e. Note that the 
only lysine available to be methylated in H3 is present at K4. e, In vitro 
methylation assay on full-length recombinant MAP3K2, H3 or H3K4* with 
recombinant SMYD3 and PRDM9. Top panels, short and long exposure 
autoradiograms of the methylation assay. No signal was detected for SMYD3 


LETTER 


on H3 and H3K4* after long exposures. The asterisk and line indicate 
breakdown products of MAP3K2 that contain K260 and can be detected in this 
methylation assay upon long exposure. Bottom panel, Coomassie stain of 
proteins in the reaction. f, Positive control of activity for enzymes used in Fig. 3f, 
g on their known respective substrates (MAP3K2, histone H3, nucleosome or 
RelA as indicated). g, In vitro methylation assays using MAP3K2-K260meo, 
mel, me2 or me3 peptides as SMYD3 substrates. Dot blot is shown as control of 
peptide’s comparable concentration used for the methylation assay. h, Mass 
spectrometry analysis of SMYD3 methylation activity on unmodified 
MAP3K2-K260 peptide. i, Specificity of the indicated MAP3K2-K260me 
antibodies in dot blot assays using MAP3K2-K260meo, mel, me2 or me3 
peptides. j, MAP3K2 is methylated in cells upon SMYD3 overexpression. 
Immunoblot analysis with the indicated antibodies from 293T cells lysates after 
Flag immunoprecipitation in cells overexpressing Flag-MAP3K2 and/or 
HA-SMYD3. 
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Extended Data Figure 7 | SMYD3 and MAP3K2 knockdown both impair 
MAP kinase signalling. a—d, Immunoblot analysis with the indicated 
antibodies of LKR10 (a, b), A549 (c), and CFPacl (d) lysates. Asterisk indicates 
a slower migrating ERK5 species that is phosphorylated. Stimulation, 10% 
serum-complemented media for 15 min (b) or EGF for 15 min at 25 ng ult 
(a, c, d). Immunoblots are representative of 3 independent biological replicates. 
e, f, SMYD3 methylation of MAP3K2 does not alter the intrinsic kinase activity 
of MAP3K2. e, Immunoblot analysis with the indicated antibodies from 
lysates of 293T cells transfected with control vector, wild-type SMYD3, 


catalytically dead SMYD3(F1834A), wild-type MAP3K2, MAP3K2(K260A), 
or kinase dead MAP3K2(K385M). f, Methylation of MAP3K2 does not alter its 
in vitro kinase activity. In vitro kinase assays were performed with the indicated 
recombinant versions of MAP3K2 (wild-type, SMYD3-resistant K260A 
mutant, or kinase dead K385 mutant) pre-methylated with wild-type 
SMYD3 or as a control, inactive SMYD3, using MEK] as a substrate. MEK1 
phosphorylation was detected by immunoblot analysis with the indicated 
antibody. 
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Extended Data Figure 8 | SMYD3 knockout augments the effects of the 
MEK1/2 inhibitor trametinib (GSK1120212) in vivo. a, Schematic of the 
caerulein pancreatitis-induced tumorigenesis protocol. Mice were treated with 
a normal dose of trametinib (1 mg per kg intraperitoneally daily) or a low dose 
(0.1 mg per kg intraperitoneally daily) or vehicle control. b, Immunoblot 
analysis with indicated antibodies of two independent pancreas biopsies 

per treatment group. c, Quantification of MUC5-positive lesions in 
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caerulein-treated pancreata from Kras and Kras;Smyd3 mutant mice treated 
with trametinib or vehicle control (n = 5, each treatment). *P value < 0.05; 
***P value < 0.001 (two-tailed unpaired Student’s t-test). Data are represented 
as mean ~ s.e.m. d, Representative macroscopic pictures of pancreata from 
each treatment group. Scale bar, 1 cm. e, Representative serial HE staining 
and IHC for pERK1/2, a marker of Ras activity, and MUCS, a marker of PanIN 
lesions. All scale bars, 50 jim. 
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Extended Data Figure 9 | SMYD3 depletion augments the effects of the performed in triplicates for each cancer line. Values represent the number 
MEK1/2 inhibitor trametinib (GSK1120212) in Ras-driven cancer cells. of cells relative to control shRNA cells without treatment at 48 h. 

a-c, Relative cell proliferation rates (bottom panel) of murine LAC cell line b, Constitutively active MEK1 (MEK1-DD) increases EGF-mediated ERK1/2 
LKR1O0 (a), human LAC cell line A549 (b), or human PDAC cell line CFPacl _ activation in SMYD3 depleted-cells. Immunoblot analysis with the indicated 
(c) with or without SMYD3 depletion by stable shRNA (SMYD3 proteins levels _ antibodies using lysates from A549 cells stably expressing shControl or 

are shown in top panel) in response to the indicated doses of trametinib. shSMYD3 and transfected with HA-MEK1-DD. Stimulation: EGF treatment 
Experiments shown represent an average of 3 independent experiments for 15 min at 25ng pl". 
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Extended Data Figure 10 | Treatment with the PP2A inhibitor cantharidin 
phenocopies SMYD3 function in vivo. a, Schematic of the caerulein 
pancreatitis-induced tumorigenesis protocol. Mice were treated with the PP2A 
inhibitor cantharidin (iPP2A, 0.15 mgkg ' intraperitoneally twice a day) or 
vehicle control. b, Immunoblot analysis with indicated antibodies on two 
independent pancreas biopsies per treatment group. c, Macroscopic pictures of 
WT and Kras;Smyd3 mutant pancreata. Note that treatment with the PP2A 
inhibitor leads to the development of enlarged, ‘hard’ pancreata characteristic 
of tumorigenic development even in Kras;Smyd3 mutant mice. Scale bar, 1 cm. 
d, Representative serial haematoxylin and eosin (HE) staining and IHC for 
pERK1/2, a marker of Ras activity, and MUCS5, a marker of PanIN lesions. All 
scale bars, 50 um. e, Summary model for SMYD3 regulation of MAP kinase 
signalling after MAP3K2 methylation. Oncogenic Ras activates several kinase 


cascades that play important roles in pancreas and lung cancer development, 
including four major MAPK pathways (ERK1/2, ERK5, JNK, and p38) as well 
as AKT signalling. SMYD3 is frequently overexpressed in pancreatic and lung 
cancers, two cancer types that are commonly driven by oncogenic Ras 
signalling. Overexpression of SMYD3 and the resulting methylation of 
MAP3K2 at K260 potentiate activation of kinases like ERK1/2 and ERKS in 
response to stimuli like oncogenic Ras. We postulate a mechanism in which the 
PP2A complex is unable to bind methylated MAP3K2, which decreases the 
ability of this enzyme to terminate activating phosphorylation events on 
MAP3K2 and/or MAP3K2 downstream targets. Under conditions with 
excessive SMYD3 protein, the physiological relationship between PP2A and 
MAP3K2 is disrupted and results in an increased pathological MAP3K2 
signalling, which cooperates with Ras to promote tumorigenesis. 
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CTP synthase 1 deficiency in humans reveals its 
central role in lymphocyte proliferation 


Emmanuel Martin’?, Noé Palmic!, Sylvia Sanquer*, Christelle Lenoir, Fabian Hauck'*, Cédric Mongellaz*, Sylvie Fabrega*”, 


9,10 


Patrick Nitschké*°®, Mauro Degli Esposti’’®, Jeremy Schwartzentruber?, Naomi Taylor’, Jacek Majewski’, Nada Jabado’””, 
Robert F. Wynn’, Capucine Picard”, Alain Fischer>?!3"4, Peter D. Arkwright’* & Sylvain Latour!?3* 


Lymphocyte functions triggered by antigen recognition and co- 
stimulation signals are associated with a rapid and intense cell divi- 
sion, and hence with metabolism adaptation’. The nucleotide cytidine 
5’ triphosphate (CTP) is a precursor required for the metabolism of 
DNA, RNA and phospholipids” *. CTP originates from two sources: 
a salvage pathway and a de novo synthesis pathway that depends on 
two enzymes, the CTP synthases (or synthetases) 1 and 2 (CTPS1 with 
CTPS2); the respective roles of these two enzymes are not known*’. 
CTP synthase activity is a potentially important step for DNA syn- 
thesis in lymphocytes*”. Here we report the identification of a loss- 
of-function homozygous mutation (1s145092287) in CTPS1 in humans 
that causes a novel and life-threatening immunodeficiency, charac- 
terized by an impaired capacity of activated T and B cells to prolif- 
erate in response to antigen receptor-mediated activation. In contrast, 
proximal and distal T-cell receptor (TCR) signalling events and res- 
ponses were only weakly affected by the absence of CTPS1. Activated 
CTPS1-deficient cells had decreased levels of CTP. Normal T-cell 
proliferation was restored in CTPS1-deficient cells by expressing wild- 
type CTPS1 or by addition of exogenous CTP or its nucleoside pre- 
cursor, cytidine. CTPS1 expression was found to be low in resting T 
cells, but rapidly upregulated following TCR activation. These results 
highlight a key and specific role of CTPS1 in the immune system by 
its capacity to sustain the proliferation of activated lymphocytes during 
the immune response. CTPS1 may therefore represent a therapeutic 
target of immunosuppressive drugs that could specifically dampen 
lymphocyte activation. 

We initially studied two unrelated families (family 1 and 2) originat- 
ing from the northwest region of England, whose four children suffered 
from severe and recurrent Epstein-Barr virus (EBV) infection, in whom 
known primary immunodeficiencies have been excluded” (Fig. la and 
Table 1). Four additional patients (family 3 to 5) originating from the 
same geographical area were identified thereafter (Methods). All patients 
had early onset of severe chronic viral infections, mostly caused by herpes 
viruses, including EBV and varicella zooster virus (VZV) and also suf- 
fered from recurrent encapsulated bacterial infections, a spectrum of 
infections typical of a combined deficiency of adaptive immunity (CID)”" 
(Table 1 and data not shown). Two patients (P4 and P5) had EBV-driven 
B-cell non-Hodgkin lymphoma. Overall, the clinical phenotype is severe 
with 3 patients having died. Six of 8 patients have undergone haema- 
topoietic stem cell transplantation. Of note, none of the patients had 
extra-haematopoietic manifestations (Table 1). 

Immunological investigations showed that most of patients had vari- 
able lymphopenia which was exacerbated during infection episodes with 


inversed CD4:CD8 T-cell ratio, whereas other blood cell counts were 
usually normal (Extended Data Table 1 and data not shown). Their immu- 
noglobulin levels were normal or elevated with increased IgG but low 
IgG2 levels with low antibody titres to Streptococcus pneumoniae. Fur- 
ther analyses were performed in patient P1.2 showing naive CD4* 
T-cell lymphopenia, increased numbers of effector memory T cells, low 
numbers of memory CD27* B cells, a complete absence of both invariant 
T cell populations (CD3*Va24*VB11*) iNKT and (CD3*CD161"" 
Va7.2") MAIT cells, as well as an impaired PHA- and antigen-induced 
proliferation of peripheral blood mononuclear cells (PBMCs) (Extended 
Data Table 2). 

To identify the gene defect underlying the immunodeficiency in 
these patients, we performed whole-exome sequencing (WES) in three 
patients (P1.1, P1.2 and P2.1). Intersection of the genetic variations 
found in the three patients pointed to an unique common homozygous 
G to C mutation in the CTPS1 gene encoding the CTP synthase 1 at posi- 
tion 41475832 in chromosome 1 with an assigned rsID (1rs145092287) 
in the dbSNP database (Fig. 1b and Extended Data Fig. 1a, b). CTPS1 
encodes a 67-kDa protein containing a CTP synthetase domain and a 
glutamine amide transfer domain promoting the formation of CTP from 
UTP and glutamine”. The identified mutation affects a splice donor site 
at the junction ofintron 17-18 and exon 18 (IVS18-1 G to C) leading to 
the expression of an abnormal transcript lacking exon 18 (Extended 
Data Fig. 1b, c). This splice mutation was found to be deleterious because 
CTPS1 protein expression could not be detected in lysates of EBV- 
transformed B cells and T-cell blasts from patients (Figs 1c and 2c and 
Extended Data Fig. 2). In contrast, CTPS2 was expressed normally in 
patient cell lysates. In the five affected families, all patients were homo- 
zygous for the IVS18-1 G to C mutation and all parents and tested healthy 
siblings were heterozygous carriers (Fig. la, b and data not shown). 
Sequencing of a cohort of 752 healthy individuals from the northwest 
of England gave an estimated frequency of homozygosity of 1:560,000. 
This represents more than a tenfold increase compared to the frequency 
estimated from available exome databases. WES data and analysis of 
polymorphic microsatellite markers in all patients revealed a common 
region of homozygosity of 1.1 Mb surrounding the IVS18-1 G to C muta- 
tion (Supplementary Information). All these data were indicative of a 
founder effect. These observations led us to conclude that the immuno- 
deficiency resulting from the CTPS1 mutation in these patients could 
be primarily associated with a T-cell immunodeficiency. 

We next examined CTPS1 expression in normal tissues. CTPS1 mRNA 
expression was comparable between the different tissues, except for 
T cells in which CTPS1 expression was strongly upregulated after cell 
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Figure 1 | Identification of CTPS1 deficiency in patients with a combined 
immunodeficiency. a, Pedigrees of the families in which a homozygous IVS18- 
1 Gto C mutation in CTPS1 was identified. When known, the genotype of each 
individual is indicated. Black boxes represent affected individuals and diagonal 
bars indicate deceased individuals. Each patient (P) is identified by 

a number. b, Diagram of the CTPS1 intron-exon organization and protein 
domains with the serine phosphorylation sites (S) indicated and the coding 
exons in grey. DNA electropherograms show the region containing the 
mutation in CTPS1 in family 1. The homozygous IVS18-1 G to C mutation is 


activation in response to TCR-CD3 and CD28 co-stimulation (Fig. 1d). 
Interestingly, in lysates from resting non-activated T-cell blasts and T 
cells from PBMCs, CTPS1 protein was almost undetectable (Fig. 2a—d). 
In contrast, CTPS2 expression was readily detected. Activation of T cells 
by anti-CD3 antibody or phorbol 12-myristate 13-actate (PMA) and 
ionomycin stimulations induced CTPS1 protein expression, whereas 
activation with IL-2 and/or IL-15 resulted in only a weak effect. Under 
the same experimental conditions, CTPS2 expression was also induced 
but toa lesser extent. In TCR-CD3-stimulated T-cell blasts, CTPS1 pro- 
tein expression was enhanced from 12 h and persisted for up to 96 has a 
consequence of CTPS1 gene transcription activation (Fig. 1d, inset and 
Fig. 2b). As expected, no expression of CTPS1 was detected in T-cell 
blasts from the CTPS1-deficient patient (P1.2) contrasting with detec- 


indicated by an arrow. Ctrl, control. c, Immunoblots for CTPS1 and CTPS2 
expression in non-stimulated EBV B-cell lines from healthy controls and 
CTPS1-mutated individuals (P1.1, P1.2 and P2.1). Actin serves as a loading 
control. d, CTPS1 mRNA expression in normal tissues monitored by RT-PCR 
in arbitrary units (a.u.). The inset shows the kinetics of CTPS1 mRNA 
expression following anti-CD3+CD28 coated beads stimulation. One 
representative experiment of three. Data represent means of technical 
triplicates; error bars denote standard deviations. 


Extended Data Fig. 1c). These data indicate that T-cell activation through 
the TCR results in a rapid and sustained CTPS1 protein expression. Of 
note, in B cells activated by anti-BCR and CpG, IL-4 and CD40L or PMA 
and ionomycin, CTPS1 was also found to be upregulated (Fig. 2d and 
Extended Data Fig. 3a, b). 

To further characterize the consequences of the CTPS1 deficiency in 
T cells, we investigated proximal T-cell activation signals as well as late 
responses. CTPS1-deficient T cells exhibited normal early and late res- 
ponses with the exception of ERK1/2 phosphorylation and CD25 and 
CD69 upregulation which were found to be decreased (Extended Data 
Fig. 4). Basal and activation-induced cell death was also slightly increased 
(Extended Data Fig. 4g). These data suggest that CTPS1 deficiency had 
limited consequences in signalling downstream of TCR-CD3. Because 


tion of CTPS1 mRNA and suggesting protein instability (Fig. 2c and _ the pool of CTP is potentially a limiting factor for DNA synthesis*"’, we 
Table 1 | Clinical features of patients 

Patient Age at first Viral infections Bacterial infections Extra- haematopoietic Outcome 

symptoms manifestations (age in years) 
EBV VZV Others 
P11 ly. SIM, chronic ° CMV, Novovirus, H. influenzae (RTI) No HSCT (8 y.) 
viraemia Rotavirus (gut) died (GVHD) (8 y.) 
Parainfluenzae | (RTI) 
P12 1m. SIM ° Adenovirus, HHV-6, No No Alive (9 y.) 
Novovirus (gut) 
P2.1 5y. LPD (CNS) Yes No H. influenzae (RTI) No HSCT (9 y.) 
a.w. (19 y.) 
P2.2 2y. Chronic viraemia ° No S. pneumoniae, ° HSCT (7 y.) 
H. influenzae (RTI) aw. (14 y.) 
P3.1 ly. SIM Yes (gastritis, No S. pneumoniae No Died 
pneumonitis) (septis, meningitis) (disseminated VZV) 
(y.) 
P3.2 3m. SIM, chronic Yes HHV-6 No ° HSCT (8 y.) 
viraemia aw. (14 y.) 
P4 Birth LPD (CNS) Yes CMV, Adenovirus, No No HSCT (6 y.) 
Rotavirus (gut) died (LPD) (6 y.) 

P5 3m. LPD (CNS), No Novovirus (gut) N. meningitis B fe) HSCT (1 y.) 
chronic viraemia Parainfluenzae Ill, (meningitis) alive (2 y.) 


(RTI) 


Adenovirus, Rhinovirus 


Abbreviations are: y., year; m., month; SIM, severe infectious mononucleosis; CNS, central nervous system; EBV, Epstein-Barr virus; VZV, varicella zona virus; HHV-6, human herpes virus 6; LPD, 


lymphoproliferative disease; RTI, respiratory tract infection; CMV, cytomegalovirus; HSCT, haematopoietic stem cell transplantation; GVHD, graft versus host disease; a.w., alive and well. 
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Figure 2 | Induction of CTPS1 expression during T-cell activation 
and defective proliferation of activated CTPS1-deficient T cells. 
a-d, Immunoblots for CTPS1 and CTPS2 expression in control T cells (from a 
healthy donor) treated with various stimuli (a) or stimulated with anti-CD3 for 
different periods of time (b), in control (Ctrl) or CTPS1-deficient cells from 
patient P1.2 stimulated with anti-CD3 for different periods of time (c) and in B 
and T cells sorted from normal PBMCs treated with indicated stimuli (d). Actin 
served as a loading control. e, Representative dot plots showing cell divisions 
by dilution of the violet dye and expression of CD25 of control (Ctrl) or 
CTPS1-deficient T cells (patient P1.2) stimulated with incremental doses of the 
anti-CD3 antibody or anti-CD3+CD28 coated beads. Inserts with histograms 
showing the violet dye dilution with the number cell divisions indicated at 
the top of each peak. Data from one of four independent experiments. f, Mean 
of index values of cell division of control T cells (Ctrl) or CTPS1-deficient cells 
(P1.2) (n = 4). Unpaired Student’s t-tests and **P < 0.01. g, Representative 
dot plots of cell cycle progression of control (Ctrl) and CTPS1-deficient T-cells 
(patient P1.2) stimulated with anti-CD3 antibody. The percentages of cells in 
each stage are indicated. Data from one of two independent experiments. 
h, Proliferation of control (Ctrl) or CTPS1-deficient CD19* B cells from 
PBMCs of healthy donor and patient P1.2. Cells were stimulated with 
anti-BCR plus CpG during 5 days. The proliferation was analysed similarly as 
in (e). Representative data from one of two independent experiments. 


carefully analysed proliferation of CTPS1-deficient T cells. In response 
to activation by antigens, anti-CD3 antibody or co-stimulation by anti- 
CD3 and anti-CD28 antibodies, CTPS1-deficient cells from three patients 
(P1.1, P1.2 and P2.2) failed to sustain proliferative responses as measured 
by [H]thymidine uptake and CFSE or violet cell tracer dye dilution 
(resulting in a weak index of cell proliferation) (Fig. 2e, fand Extended 
Data Table 1 and Extended Data Figs 5 and 6). Uptakes of PH]uridine 
and [’H] cytidine were also found to be impaired in activated CTPS1- 
deficient T cells. This suggests that both RNA and DNA synthesis were 
affected (Extended Data Fig. 6). Defective proliferation of CTPS1-deficient 
T cells was associated with a lack of cell cycle progression as a majority 
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of cells were arrested in the G1 phase (Fig. 2g). Proliferation of CTPS1- 
deficient B cells activated by anti-BCR and CpG was also found to be 
defective, whereas that of IL-2-activated natural killer cells seemed to 
be less affected (Fig. 2h and Extended Data Fig. 5b). 

Downregulation of CTPS1 expression in control T cells, by lentiviral 
transduction of two distinct short hairpin RNA (shRNA) together with 
a GFP reporter gene, resulted in a specific decrease in the CD3-mediated 
proliferation of GFP-positive cells (Fig. 3a). No changes in proliferation 
were detected in non-targeted GFP-negative cells or in cells targeted 
with a scramble shRNA. The diminished proliferation resulting from 
the inhibition of CTPS1 expression led to a selective cell growth disad- 
vantage with a decreased number of GFP targeted cells over time (middle 
panel). A similar decrease in proliferation rate was also observed in the 
Jurkat T-cell line in which CTPS1 expression was downregulated (Ex- 
tended Data Fig. 7). 

Together, these results indicate that CTPS1 deficiency causes a defect 
in T-cell proliferation in response to TCR-CD3 activation. To formally 
prove the causal relationship between CTPS1 deficiency and defective 
T-cell proliferation, we carried out reconstitution experiments with wild- 
type CTPS1 or by direct addition of CTP or its cytidine precursor that 
acts on CTP levels through the salvage pathway. Expression of ectopic 
CTPS1 in CTPS1-deficient T cells fully restored proliferation upon CD3 
stimulation (Fig. 3b) and enabled cells to expand selectively as shown 
by the accumulation of GFP-positive cells expressing CTPS1 (Fig. 3c, 
left panels). No such effect was detected in CTPS1-deficient cells trans- 
duced with an empty vector or in control cells transduced with the CTPS1- 
containing vector. 

Proliferation and CD25 expression of CTPS1-deficient cells also recov- 
ered to a normal level by addition of CTP or cytidine (Fig. 3d and data not 
shown). In contrast, addition of a mix of UTP, GTP and ATP or uracil, 
guanosine and adenosine did not result in increased proliferation of 
CTPS1-deficient cells. To determine the influence of the CTPS1 defect 
on the de novo pyrimidine synthesis pathway, we measured the incorp- 
oration of carbon from ['*C] aspartate into nucleic acids of activated 
CTPS1-deficient T cells, which is a specific assay for de novo pyrimidine 
synthesis“* (Fig. 3e and Extended Data Fig. 6b, c). Incorporation of PH] 
thymidine and [*H]uridine was analysed in parallel as control of the global 
RNA and DNA synthesis. TCR-CD3 activation-mediated incorporation 
of [4C] aspartate, PH] thymidine and [’H]uridine was significantly de- 
creased in CTPS1-deficient T cells. Addition of exogenous CTP or cytidine 
that bypassed the de novo synthesis pathway restored incorporation of 
[H]thymidine but not of [4C] aspartate in CTPS1-deficient cells, thus 
demonstrating that the de novo CTP synthesis pathway is impaired in 
the absence of CTPS1. Deazauridine, an analogue of UTP anda known 
inhibitor of CTP synthetase activity'’ completely blocked T-cell prolif- 
eration of control cells in response to CD3 activation without affecting 
proximal TCR-CD3-mediated responses, similar to results observed in 
CTPS1-deficient cells (Fig. 3f and data not shown). As expected, inhi- 
bition of T-cell proliferation by deazauridine was fully reverted by ad- 
dition of cytidine and partially by uridine, but not by adenosine or 
guanosine. Analysis of nucleotides pools in activated CTPS1-deficient 
T-cell blasts and CTPS1-deficient B/EBV cell lines revealed decreased 
levels of CTP, as also observed in activated normal cells treated with 
deazauridine (Fig. 3g, h). Defective CTPS1 expression or addition of 
deazauridine also led to reduced pools of ATP, GTP and UTP in acti- 
vated T cells (Extended Data Fig. 8) suggesting interconnection in the 
nucleotide pools’®. In contrast, CTP as well as ATP, GTP and UTP were 
found to be normal or increased in resting CTPS1-deficient T cells as the 
salvage pathway is suggested to be predominant in quiescent cells”!”. Ex- 
pression of wild-type CTPS1 in CTPS1-deficient B/EBV cell lines restored 
levels of CTP comparable to control cells and conferred a selective growth 
advantage to cells (Fig. 3h, i). 

This study reveals a critical role for CTPS1 in promoting the pro- 
liferation of T cells following their activation. However, proliferation of 
B cells was also found to be dependent on CTPS1. This may directly 
participate in the susceptibility to encapsulated bacterial infections seen 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


av 
& awane 
a Shscramble___ShCTPS1no.1__ShCTPSt no.2 SPL 
- PRA 
3 i 1.2 FS OO 
- & FF F 
go 10 
8 _——_— 
+ 08 —e— Sh scramble Anti-CTPS1 
a ff 
ira 
o 6 06 —*— Sh CTPS1 no.1 
SSC > 6 —¥-— Sh CTPS1 no.2 _———— 
2 04 Anti-CTPS2 
GFP- & 
GFP+ | 0.2 
Anti-actin 
0 T T T T T T T T 1 
0 1 2 3 4 5 6 7 8 
\ Time (days) 
Ctrl. P1.2 
c —e- CTPS1 vector Pee CTPS1 vector | 54 5 ae = 
—a— Empty vector —a— Empty vector 2 2 2 g 
b 2.0 4 —o- GFP- 6 82 8 8 
5 —e-GFP* | Empty vector 2 z B z @ 
a o 
3 454+ GFE 40 
5 18 |CTPStvector a eee 
= + GFP* 5 30 — 
is} - . 
Ctrl £ 10] S =w Anti-CTPS1 
3 § 20 
z a ee er 
fs 4 ra 
= 05 & 10 Anti-CTPS2 
0.0 ; : OF T T T T T T 1 —— 
0.01 0.1 1 10 1 2 3 4 5 6 7 8 Anti-actin 
Time (days) 
c d [+cte +UTP + Cytidine f+ Uridine 
2 GTP. | Guanosine 
3 | ATP. Adenosine 
1] = Ctrl \ 
3 ! 
P1.2 i 
ro) 
| x No stim. 
| 3 = Anti-CD3+CD28 
/ = Anti-CD3+CD28 
y) + nucleotide 
0.0 or nucleoside 
< .! 1 
CellTrace violet ————> 0.01 0.1 1 to P12 
Anti-CD3 (ug.mI-*) 
e a Ctrl. Ctr.+CTP P1.2+CTP - 
GHBP1.2 WZACtr.+cytidine ZZ P1.2+cytidine CellTrace violet > 
f No stim. Anti-CD3+CD28 Anti-CD3+CD28 + nucleoside 
— 8,000 4 + Deazauridine 
E +Cytidine + Uridine + Guanosine + Adenosine +4 nucleosides 
Ss 6,000 4 \ 
2 
£ 
£ 4,000 4 
a 
2 
@ 2,000 | | 
= 
0 lama A - 
No stim. No stim. Anti-CD3 Cellicce violet > 
g : ene te h wee he i 75 
120 4 120 mcr 4 2 
a Ctrl+Deaza. x g 
7 7x mm Pt2 3007 g 
By 4) v 3 504 —e-P1.2 
2 80 4 e 80-4 Sul 2 Ld é g P24 | Empty vector 
D a > 2004 ve a 
4 ag os 7 £ S 4 ot | CTPS1 vector 
a 4 oe So —&—P2. 
iz wor So FE 
° 404 v 404 Onv Oe arr —— = 289 
ss 100-44 
. qo =e) m@Ctrl 5 | ee ae 
J ra J Pat. a 
e Mm Pat.+CTPS1 
Q oJ 0 or 
No stim. Anti-CD3+CD28 0 5 10 15 20 25 30 
Time (days) 


Figure 3 | CIPS! is required for proliferation of T cells in response to 
TCR-CD3 activation. a, Proliferation of T cells in which CTPS1 expression 
was silenced with vectors containing shRNA for CTPS1 (Sh CTPS1 no. 1 or Sh 
CTPS1 no. 2) or containing a scramble shRNA (Sh scramble) with GFP gene 
reporter. Representative dot plots of GFP* cells corresponding to transduced 
cells (left upper panels). Representative histograms of violet dye dilution 
showing the cell divisions after stimulation (left lower panels). Curves showing 
the ratio of the percentage of GFP” cells at different days to the percentage 
of GFP* cells at day 0 in long-term expansions after repeated stimulation 
(middle panel). Immunoblots for CTPS1 and CTPS2 expression in 
transduced cells and non-transduced cells (—) (right panels) and actin as 
loading control. One representative of two experiments. b, c, Proliferation of 
control (Ctrl) and CTPS1-deficient T cells (patient P1.2) transduced by empty 
or wild-type CTPS1-containing vector. Representative histograms of violet dye 
dilution (b, left panels). Means of indexes of cell division after stimulation 

(b, right panels) from triplicate of one representative of two experiments. 
Curves showing the percentage of GFP transduced cells after repeated 
stimulation (c, left panel). Representative data from one of two independent 
experiments. Immunoblots same as in (a) (c, right panels). d, Representative 
histograms of violet dye dilution showing cell divisions of control (Ctrl) and 


CTPS-1-deficient T cells (P1.2) incubated with the indicated nucleotides or 
nucleosides before stimulation. Data from one of three independent 
experiments. e, Incorporation of ['*C]aspartate, a tracer of the de novo 
pyrimidine nucleotide synthesis and [*H]thymidine as a control of 
proliferation/DNA synthesis. T cells were labelled during stimulation. Means of 
incorporated radioactivity (c.p.m.) (n = 6). f, Same as d except that control 

T cells were incubated with deazauridine. Data from one representative of three 
independent experiments. g, Concentration of CTP in control T cells incubated 
with deazauridine (Ctrl + Deaza., n = 4) or not (Ctrl, n = 6) and CTPS1- 
deficient cells (P1.2, n = 3) after stimulation with anti-CD3+CD28 coated 
beads. Data from three independent experiments. h, Concentration of CTP in 
cell extracts of EBV B-cell lines from healthy controls (Ctrl; n = 6), and 
CTPS1-deficient patients transduced (Pat. + CTPS1, n = 4) or not (Pat., n = 6) 
with wild-type CTPS1-containing vector. P1.1 (squares), P1.2 (circles) and 
P2.1 (triangles). For controls, each symbol corresponds to cells of a different 
donor. Data from two independent experiments. i, Proliferation of CTPS1- 
deficient EBV B-cell lines (P1.2 and P2.1) transduced by empty or wild-type 
CTPS1-containing vector. Curves showing the percentage of GFP“ -transduced 
cells in culture. Unpaired t-tests and ***P < 0.001, **P < 0.01, 

*P< 0.05 (b, e, g, h). 
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in CTPS1-deficient patients and account for the low titres of S. pneu- 
moniae antibodies as it is a T-cell-independent B-cell response. The role 
of CTPS1 in B cells could be different or/and less important than in 
T cells. Of note, CTPS1-deficient B cells preserve an intact capacity to 
expand upon transformation by EBV and patients had normal immu- 
noglobulin levels and/or elevated IgG. Decreased expansion of natural 
killer cells and low numbers of iNKT and MAIT cells might also con- 
tribute to the CTPS1 immunodeficiency as these cells have been pro- 
posed to have a role in a broad range of immune responses including 
anti-EBV immunity'**'. The finding that CTPS1-deficiency causes no 
other significant clinical consequences favours a redundancy with CTPS2 
activity in other cell lineages and tissues. Interestingly, pyrimidine pools 
including CTP have been previously shown to be strongly expanded in 
PHA-stimulated T cells by de novo pathways including increased CTPS 
activity*”. The induction of CTPS1 expression in activated T cells reported 
here thus appears as the major determinant of CTP pool increase. Indeed, 
proliferation of CTPS1-deficient T cells was restored to normal levels 
by addition of CTP. The exact mechanism(s) by which TCR signalling 
induces a rapid expression and activation of CTPS1 in T cells remains 
to be determined, although we showed that the ERK pathway is required, 
as well as tyrosine phosphorylation signals (Extended Data Fig. 3c). It 
is interesting to note that T-cell differentiation does not appear to be 
severely impaired by CTPS1 deficiency, suggesting that CTP pools in 
thymocytes may originate from the nucleoside salvage pathway and/ 
or the CTPS2 activity**"*. Notably, CTPS1 activity is critical for the in- 
tense cell division induced by antigenic stimulation as exemplified by mas- 
sive proliferation and expansion of CD8™ T cells during viral infections**”*. 
In the absence of CTPS1, we showed that de novo pyrimidine syn- 
thesis pathway is impaired but not totally abrogated. This residual activity 
is probably dependent on CTPS2. Recently, the de novo pyrimidine syn- 
thesis pathway was shown to be dependent on post-transcriptional reg- 
ulation by mTORC1 and S6 protein (S6K) kinases that activate the first 
enzymatic steps required for pyrimidine synthesis*”””*. Thus, distinct 
regulatory mechanisms control de novo pyrimidine synthesis. Based on 
the present study, CTPS1-mediated tuning of CTP synthesis in lympho- 
cytes appears to be a key element in enabling adaptive immune responses. 
If CTPS1-specific inhibitors can be designed, they would potentially 
be highly specific immunosuppressive drugs able to inhibit auto- or 
allogenic-specific T-cell and B-cell responses without additional tox- 
icity given the lymphocyte specificity of the CTPS1-deficiency pheno- 
type. In conclusion, our results provide the first in vivo evidence of a 
role of the de novo pyrimidine synthesis pathway as a critical step for 
proliferation of T and B lymphocytes when activated by antigens. 


METHODS SUMMARY 


Informed consent was obtained from donors, patients and families of patients. The 
study and protocols conform to the 1975 declaration of Helsinki as well as to local 
legislation and ethical guidelines. See Methods for full experimental procedures. In 
several experiments, data are expressed as means + standard deviation (s.d.) denoted 
by the error bars. P values were calculated by two-tailed Student’s t-test. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Cohorts of patients. Beside the four initially identified patients, four additional 
patients were identified by screening 10 patients (9 families) originating from the 
northwest of England with severe chronic viral infections, mostly caused by herpes 
viruses, including EBV and VZV. Furthermore, 24 patients (24 families) originat- 
ing from different geographical areas with the same phenotype were also tested for 
all exons of CTPS1 in order to identify other mutations and none was found to bea 
carrier of CTPS1 mutations. 

Exome sequencing and analysis. Exome capture was performed according to 
the manufacturer’s protocol using the Illumina TruSeq exome enrichment kit and 
sequencing of 100 bp paired-end reads on an Illumina HiSeq. Approximately 10 Gb 
of sequence were obtained for each subject such that 90% of the coding bases of the 
exome defined by the consensus coding sequence (CCDS) project were covered by 
at least 10 reads. Adaptor sequences and quality trimmed reads were removed using 
the Fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) and a custom script was 
then used to ensure that only read pairs with both mates present were subsequently 
used. Reads were aligned to hg19 with BWA31, and duplicate reads were marked 
using Picard (http://picard.sourceforge.net/) and excluded from downstream ana- 
lyses. Single nucleotide variants (SNVs) and short insertions and deletions (indels) 
were determined using SAMtools (http://samtools.sourceforge.net/) pileup and 
varFilter32 with the base alignment quality (BAQ) adjustment disabled, and they 
were then quality filtered to require at least 20% of reads supporting the variant call. 
Variants were annotated using both ANNOVAR33 and custom scripts to identify 
whether they affected protein coding sequences, and whether they had previously 
been seen in the dbSNP, the 1000 Genomes data set (1,092 genomes), or in approxi- 
mately 2,073 exomes previously sequenced at our centre. A variant detected in a 
patient was considered to be a candidate mutation if it had not been reported or had 
a frequency below 0.001% in the three databases indicated above. At the time the 
homozygous G to C mutation in CTPS1 (at position chr1:41475832) was identified 
by WES in the three patients (P1.1, P1.2 and P2.1), it was not described as a dbSNP 
or assigned to a rsID. Afterwards, this mutation has been identified in the NHLBI 
GO Exome Sequencing Project (http://evs.gs.washington.edu/EVS/) with the assigned 
rsID: rs145092287. In the NHLBI GO Exome Sequencing Project, the rs145092287 is 
present three times in a heterozygous status among 4,300 genomes from an European- 
American population and not found in the 2,203 genomes of an African-American 
population. The rs145092287 was not found in a homozygous or heterozygous 
status in other available genome databases (NCBI, 1000 Genomes project and the 
3,519 genomes of our centre). Homozygosity regions around the rs145092287 were 
determined in the exomes by looking at the homozygous variations. Between the 
positions chr1:40737516 (rs6677717) and chr1:42008077 (rs 63729761) a succes- 
sion of 97 homozygous variations (without heterozygous variations) was found to 
be shared by the three patients. 

Genomic DNA sequencing. Genomic DNA from peripheral blood cells, EBV-B 
cell lines and/or fibroblasts of patients, their parents, and other family members 
was isolated according to standard methods. Genomic DNAs of 752 healthy con- 
trol subjects born in the northwest of England were obtained from the UK 1958 
Birth Cohort (http://www2.le.ac.uk/projects/birthcohort). The estimated frequency 
of the CTPS1 mutation in the populations was calculated according to the Hardy- 
Weinberg law. Oligonucleotide primers flanking the 3’ region of intron 17-18 and 
exon 18 of CTPS1 were used to amplify genomic DNA: forward 5’-AGAGTTGGT 
GGTAGGGTGTGTGAC-3’ and reverse 5’-CTTGCAATCGCAGTGTGTTATC 
AC-3'. PCR products were amplified using high fidelity Platinum Taq DNA Poly- 
merase (Invitrogen) according to the manufacturer’s recommendations, purified 
with the QIAquick gel extraction kit (Qiagen) and sequenced using the ABI PRISM 
BigDye Terminator Cycle Sequencing Ready Reaction Kit (PerkinElmer) according 
to the manufacturer’s recommendations. All collected sequences were analysed using 
4peaks software (Version 1.7.2; A. Griekspoor and T. Groothuis, http://nucleobytes. 
com/index.php/4peaks). 

Analysis of microsatellite markers. Microsatellite markers were genotyped using 
UniSTS sequences and mapping information available from the NCBI (http://www. 
ncbi.nlm.nih.gov). Genomic DNA from patients was used as templates to amplify 
by PCR with specific fluorescent labelled oligonucleotides, the polymorphic repeats 
corresponding to the microsatellite markers. PCR products were evaluated using an 
ABI 3100 DNA Fragment Analyzer (Applied Biosystems), and data were analysed 
using Genescan and Genotyper software (Applied Biosystems). 

Gene expression analysis. Total RNA was isolated from EBV-B cell and activated 
T-cell blasts of P1.2, P2.1 patients and control donors using the RNeasy Mini kit 
(Qiagen). The samples were depleted of genomic DNA and reverse transcription 
was performed using Superscript II First Strand Synthesis System (Invitrogen). 
cDNAs were used as a template to perform PCR amplifications of exon 15 to exon 
19 of CTPS1 or exon 4 of actin with the following primers using standard proto- 
cols: CTPS1 forward primer: 5'-GAGAGGCACCGCCACCGATTTG-3’, CTPS1 
reverse primer: 5’-GCCAGTACACGTGATGGGACATGC-3’, actin forward primer: 
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5'-CTCCTTAATGTCACGCACGAT-3’; actin reverse primer: 5’-CTCCTTAAT 
GTCACGCACGAT-3’. PCR to amplify full length CTPS1 cDNAs from control and 
patients cells were also performed using the following primers, forward primer: 
5'-AGCTCTGTCGCTGACGGGAGGAT-3’ (exon1); reverse primer: 5’-GCCA 
GTACACGTGATGGGACATGC-3’ (exon 19). PCR products were verified by se- 
quencing revealing the expression of an abnormal CTPS1 transcript lacking exon 18 
in patients’ cells. Multiple tissue cDNA panels from Ozyme (Human MTC panel I, 
II and Human Immune Sytem MTC panel) were analysed for CTPS1 and CTPS2 
gene expression by qRT-PCR. Gene expression assays were performed with Assays- 
on-Demand probe and primer combinations (CTPS1, Hs01041858; CTPS2, 
Hs00219845; GAPDH, Hs027558991) from Applied Biosystem labelled with 6- 
carboxy-fluorescein (FAM) dye, and universal reaction mixture. Real time quant- 
itative PCRs for GAPDH, CTPS1 and CTPS2 were performed in triplicate using a 
LightCycler VIIA7 System (Roche). Expression levels were determined by relative 
quantification using the comparative threshold cycle method 2“4* in which AACt 
is determined as followed: (Ctrarget gene — Ctreference gene) target tissue — (Ctrarget gene 
— Ctreference gene) Calibrator tissue. The results shown in arbitrary units (a.u.) have 
been normalized to GAPDH gene expression and are presented as the relative 
change in gene expression normalized against the calibrator sample corresponding 
to leukocyte tissue. 

Cell culture and stimulation. Peripheral blood mononuclear cells (PBMCs) col- 
lected from patients and healthy donors were isolated by Ficoll-Paque density 
gradient (Lymphoprep, Proteogenix) from blood samples using standard proce- 
dures. Expansion of T-cell blasts were obtained by incubating PBMCs for 72 h with 
phytohaemagglutinin (PHA) (2.5 jtg ml a Sigma-Aldrich) in RPMI 1640 GlutaMax 
medium (Invitrogen) supplemented with 5% human male AB blood group serum 
(BioWest), penicillin (100 U mI’) and streptomycin (100 ug ml‘), After 3 days, 
dead cells were removed by Ficoll-Paque density gradient and blasts were main- 
tained in culture with IL-2 (100 U ml). For proliferation and cell cycle analyses, 
blasts were washed and cultured without IL-2 for 72 h to synchronize the cells. Blasts 
or PBMCs were then cultured during 4 to 6 days in complete medium alone or in 
the presence of 0.1, 1 or 10 pg ml”! coated anti-CD3 antibody (clone OKT3, eBio- 
sciences), anti-CD3/CD28-coated beads (Invitrogen), PHA (2.5 1g ml 1,10 °M 
ionomycin (Sigma-Aldrich) plus 107” M phorbol 12-myristate 13-acetate (PMA, 
Sigma-Aldrich), Candidin (5 pg ml !, Bio-Rad), tetanus toxoid (1 to 8,000 dilu- 
tion, Statens Serum Institute) or tuberculosis antigen (50 pg ml, Statens Serum 
Institute). Proliferation and cell cycle were analysed at the indicated time points. 
Then 40 LM of 3-Deazauridine (DAZ, Sigma-Aldrich) was added for 12h before 
the stimulation. In complementation experiments, blasts were incubated with 100 1M 
of CTP, UTP, GTP or ATP (New England Biolabs) separately or in combination, or 
with 200 uM of cytidine, uridine, guanosine or adenosine (Sigma-Aldrich) sepa- 
rately or in combination 12h before the stimulation. For dosage of nucleotides, 
blasts were deprived of IL-2 for 72 h before stimulation or not with anti-CD3/CD28- 
coated beads for 48 h and cell lysates were prepared. Jurkat cells”’, 293-T cells and B 
EBV cell lines from patients were cultured in RPMI 1640 GlutaMax medium sup- 
plemented with 10% heat-inactivated fetal calf serum (Gibco), penicillin (100 U ml ») 
and streptomycin (100 jg ml’). Cells were free of mycoplasma contamination. 
Proliferation and cell cycle assays. Cell proliferation was monitored by labelling 
cells with the cell trace violet dye (CellTrace violet proliferation kit, Invitrogen) or 
CFSE (5 uM, Invitrogen) before stimulation, according to the manufacturer’s 
instructions. After 4 or 6 days of culture, cells were collected and CellTrace violet 
or CSFE fluorescence dilution was assessed by flow cytometry. The division index 
of proliferation was calculated using FlowJo software (Tree Star) and corresponds 
to the average number of cell divisions per cell including the undivided peak. T-cell 
responses within total PBMCs were also measured by [*H]thymidine incorpora- 
tion after 6 days of stimulation. A total of 0.074 MBq ml ’ of [*H]thymidine was 
added during the last 18h of stimulation. Cell proliferation was determined by 
c.p.m. of [?H]thymidine incorporated in cells that were counted with TopCount 
NXT beta counter (PerkinElmer). Cell cycle analysis was determined by measur- 
ing the incorporation of the nucleoside analogue 5-ethynyl-2-deoxyuridine (EdU) 
into newly synthesized DNA, according to the manufacturer’s instructions (Click- 
iT EdU, Invitrogen) after 48 h of anti-CD3 stimulation. EdU incorporation in cells 
was measured following conjugation of EdU to azide-modified Alexa Fluor 647 
dye. Cells were analysed by flow cytometry with a FACS-Canto II flow cytometry 
system (BD Biosciences). 

Nucleic acids and de novo pyrimidine synthesis assays. PBMCs were stimulated 
in the presence of 1 pg ml’ coated anti-CD3 antibody (clone OKT3, eBiosciences) 
or 2.5pgml * PHA (Sigma-Aldrich) for 3 days or in the presence of candidin 
(5g ml~ | Bio-Rad), tetanus toxoid (1 to 8,000 dilution, Statens Serum Insti- 
tute) or PPD (tuberculin) (50 pg ml |, Statens Serum Institute) for 6 days. Then 
0.074 MBq ml! of [*H]thymidine, [*H] cytidine, [*H]uridine or [*H]leucine or 
0.185 MBq ml! U-['*C]aspartate were added during the last 18h of stimulation. 
For (*H] cytidine, this corresponds to 0.133 1M, which does not restore normal 
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proliferation of CTPS1-deficient cells that required 50 uM. Cells were collected 
with a Filter Mate harvester (PerkinElmer) on filters for labelled cell assays (Uni- 
filter plates, PerkinElmer) that retain nucleic acids and filters were then washed. 
Radioactivity (c.p.m.) on filters (corresponding to radiolabelled compounds incor- 
porated in nucleic acids) was measured by liquid scintillation counting with Top- 
Count NXT beta counter (PerkinElmer). 

Apoptosis assay. Apoptosis was determined by evaluating phosphatidylserine 
(PS) exposure in the outer leaflet of the cytoplasmic membrane with PE-conjugated 
annexin-V in combination with 7-AAD Viaprobe (Apoptosis Detection Kit I, BD) 
12h after stimulation with coated-OKT3 (0.1, 1 or 10 1g ml~'). Apoptosis was 
based on the percentage of annexin V*/7AAD_ cells to exclude necrotic cells. Cells 
were analysed by flow cytometry. 

Cytokine production and degranulation. For intracellular staining of cytokines, 
cells were stimulated overnight with PMA and ionomycin or anti CD3/CD28 beads 
in the presence of brefeldin A (GolgiPlug, BD). Cells were then fixed and per- 
meabilized using the BD cytofix/cytoperm plus kit (BD Pharmingen) according to 
the manufacturer’s instructions. Cells were labelled with PE-anti-IL-2 (rat IgG2a, 
MHQ1-17H12), PE/Cy7-anti-TNF-o (mouse IgG1; MAb11), APC-anti-IFN-y (mouse 
IgG1, 4S.B3) and isotype-matched monoclonal antibodies purchased from Bio- 
Legend and analysed by flow cytometry. Degranulation was determined by analysis 
of the expression of CD107/LAMP, a marker of the exocytosis of lytic granules. 
Blasts were stimulated for 3 h in the presence of 0.3, 3 or 30 Lg ml ! coated-OKT3 
and simultaneously labelled with eFluor660-anti-CD107a (mouse IgG1; eBioH4A3), 
eFluor660-anti-CD107b (mouse IgG1; eBioABL-93) or isotype matched monoclo- 
nal antibodies purchased from eBiosciences. Thereafter, cells were collected, washed 
and stained with FITC-anti-CD3 and PE-anti-CD8 monoclonal antibodies and 
analysed by flow cytometry. 

Flow cytometry. Cell staining and the flow-cytometry-based phenotypic analyses 
of PBMCsand cells were performed according to standard flow cytometry methods. 
The following monoclonal antibodies were conjugated to fluorescein isothiocyanate 
(FITC), phycoerythrin (PE), phycoerythrin-Cyanine7 (PE/Cy7), phycoerythrin- 
Cyanine5, phycoerythrin-Cyanine5.5, allophycocyanin (APC), allophycocyanin- 
Vio7, View blue or View green: anti-CD25 (M-A251), anti-CD27 (M-T271), anti-CD31 
(WM59), anti-CD45RA (HI100), anti-CD45RO (UCHLI), anti-CD197/CCR7 (3D12), 
anti-TCR (T10B9), anti- TCRy6 (B1), anti-CD95 (DX2), anti-CD19 (HIB19), anti- 
CD21 (B-ly4), anti-IgM (G20-127), anti-IgD (IA6-2), anti-CD56 (B159) and anti- 
CD16 (3G8), all purchased from BD Biosciences and anti-CD3 (BW264/56), anti-CD4 
(VIT4), anti-CD8 (BW135/80) and anti-CD69 (FN50) from Miltenyi Biotec. iNKT 
cells were detected by staining with anti- V024 (C15) and anti-VB11 (C21) (Beckman 
Coulter) and MAIT cells by staining with anti-Vo.7.2 (3C10) and anti-CD161 (HP- 
3G10 (Biolegend). All data were collected on a FACS-Canto II cytometer (BD Bio- 
sciences) and analysed using FlowJo Version 9.3.2 software (Tree Star). 
Immunoblotting and analysis of CTPS1 protein expression. Cells (5 10° cells 
per ml) were stimulated by anti-CD3 antibody (1 jg ml ') crosslinking with a rabbit 
anti-mouse IgG (2 jig ml") or anti CD3/CD28 beads for the indicated time periods. 
Cells were then lysed in 1% NP40, 50 mM Tris pH 8, 150 mM NaCl, 20 mM EDTA, 
1mM Na3;VO,, 1mM NaF and complete protease inhibitor cocktail (Roche), 
as previously described”. Protein concentrations were quantitated by BCA assay 
(BIO-RAD). Then 80 1g of proteins were separated by SDS-PAGE and trans- 
ferred on PVDF membranes (Millipore). Membranes were blocked with milk or 
BSA before incubation with antibodies. The following monoclonal antibodies and 
rabbit polyclonal antibodies were used for immunoblotting: anti-PLC-y1 (#28228), 
anti-phosphorylated PLC-y1 (#2821S), anti-phosphorylated ERK1/2 (clone E10, 
#9106S), anti-ERK1/2 (#4695S) anti-phosphorylated IkBa (clone 5A5, #9246S), 
anti-phosphorylated PKCtheta (#9377S), NFKB (clone D14E12) anti-phosphorylated 
AKT (Serine 473, clone 587F11) and anti-phosphorylated tyrosine (4G10) pur- 
chased from Cell Signaling Technology and rabbit polyclonal antibodies anti-actin 
(#A2066) and anti-CTPS1 raised against the residues 341 to 355 (#SAB111071) or 
416-430 (#$AB111072) and anti-CTPS2 (#HPA017437) purchased from Sigma- 
Aldrich. Anti-CTPS1 rabbit polyclonal antibodies (K-21) from Santa Cruz were 
also tested. Membranes were then washed and incubated with anti-mouse or anti- 
rabbit HRP-conjugated secondary antibodies from Cell Signaling and GE Health- 
care, respectively. Pierce ECL western blotting substrate was used for detection. For 
inhibition assays of the signalling pathways after TCR-CD3 activation, cells have 
been stimulated with anti-CD3/CD28 beads for 48 h in the presence of 100 nM of 
the MAPK/ERK inhibitor PD0325901, 10 1M of the Src family protein tyrosine 
kinase inhibitor PP1, 10 uM of the Src family protein tyrosine kinase inhibitor PP2, 
10 uM of the selective Ca?* chelator 1,2-bis-(2-aminophenoxy) ethane-N,N,N’,N’- 
tetraacetic acid tetra(acetoxymethyl) ester (BAPTA/AM), 10 iM of the IxBa phos- 
phorylation inhibitor Bay 11-7085 or 10 uM of PI3Kdelta inhibitor IC87114. All 
were from Sigma-Aldrich, except IC87114 from Calbiochem. The concentrations 
used were typical and previously reported. After 48 h incubation with the different 
inhibitors, cell viability was verified and was more than 90% in each condition. The 


activity and the selectivity of the inhibitors was verified in parallel by immuno- 
blotting for phospho-tyrosine (for PP1 and PP2), IkBa phosphorylation (for Bay 
11-7085), ERK phosphorylation (for PD0325901) and AKT phosphorylation (for 
1C87114) (data not shown). 

Calcium flux analysis. Ca”* responses were assessed by flow cytometry, as previ- 
ously described*". Briefly, cells were loaded with 5 uM Indo-1 a.m. (Molecular Probes), 
washed, incubated with anti-CD4-APC and anti-CD8-PE monoclonal antibodies, 
stimulated by anti-CD3 antibody (0.125 jig ml~ 1) crosslinking with F(ab’), rabbit 
anti-mouse IgG (10 Lg ml ') and then incubated with ionomycin (1 tM). Cells 
were analysed with a FACSAria flow cytometer (BD Biosciences). Ca?" flux data 
were obtained using kinetic analyses of FlowJo software package (Tree Star). Intra- 
cellular Ca’* levels correspond to the normalized ratio of Ca”*-bound to Ca**- 
free Indo-1 fluorescence and are plotted as a function of time. 

Plasmid constructs, cell transfections and infections. A full-length cDNA encod- 
ing wild-type CTPS1 anda full length cDNA encoding the mutant CTPS1A18 were 
obtained by RT-PCR from control blasts and blasts from patient 1.2 respectively 
using the forward 5’-CGGGATCCCACCATGAAGTATATTCTGGTT-3’ and reverse 
5'-CCGCTCGAGTCAGTCATGATTTATTGA-3’ (for wild type) and 5’-CCG 
CTCGAGTTAAAGAAAGTCTCCAAGC-3’ (for CTPS1A18) specific primers. 
The cDNAs were verified by sequencing and inserted into a bicistronic lentiviral 
expression vector encoding the green fluorescent protein (GFP) as a reporter (pLenti7.3/ 
V5-TOPO, Invitrogen). Viral particles for infection were obtained by co-expression 
of the lentiviral vector containing CTPS1 with third-generation lentiviral plasmids 
containing Gag-Pol, Rev and the G protein of the vesicular stomatitis virus (VSVG) 
into HEK 293T using calcium phosphate. Viral supernatants were collected every 
12h on 2 consecutive days, starting 48 h after transfection, and viral particles were 
concentrated by ultracentrifugation at 49,000¢ for 1.5 h at 12 °C. Cells were infected 
with viral particles at a minimal titre of 10’ tranducing units per ml and 48h after 
infection, cells were deprived of IL-2 during 72h for proliferation assays. To assess 
the selective advantage of GFP expression during long-term expansion, blasts were 
re-stimulated with anti-CD3/CD28 beads (Invitrogen) every 48h during 8 days. 
For CTPS1 gene knockdown, blasts or Jurkat cells were infected at day 3 of PHA 
stimulation with the pLKO.1 lentiviral vector containing a CTPS1-specific shRNA 
(OpenBiosystems, n°TRCN0000045349 and n°TRCN0000045350) or a scrambled 
shRNA in which the puromycin resistance gene was replaced by the GFP gene. 
Proliferation was analysed in GFP* and GFP blasts after 4 days of stimulation with 
anti-CD3/CD28 beads as previously described. For survey of loss of GFP expression 
in long-term expansions, blasts were repeatedly stimulated with anti-CD3/CD28 
beads every 48 h during 8 days. Jurkat cells were maintained in culture after infec- 
tion during 26 days. The proportions percentages of GFP cells were determined 
by flow cytometry. 

Quantification of intracellular nucleotides. Intracellular pools of nucleotides 
were quantified based on previously described methods**”’. Briefly, five million cells 
were washed in 0.1 M phosphate buffer (pH 7.4) and lysed in 60 pl HClO, 1 M, 
containing 2 1M 8-bromo-AMP (8-BrAMP) as an internal standard. After 12,000g 
centrifugation for 5 min at 4 °C, supernatants were transferred to a 384-well plate 
and kept at 4°C in an auto sampler before injection. Aliquots of 5 pil were injected 
onto a separation column (ACQUITY UPLC BEH300 C18, 1.7 um, 2.1 X 100 mm 
reversed-phase column, Waters) with a flow rate of 0.5ml min“! and analysed 
with a tandem mass spectrometry system consisting of an Acquity Ultra Perfor- 
mance Liquid Chromatography (UPLC Waters) interfaced with a xevo-TQ-S tan- 
dem quadrupole mass spectrometer (Waters). Mobile phase A was 0.1% formic 
acid in water and mobile phase B, 0.1% formic acid in acetonitrile. A programmed 
mobile phase-gradient was used during a 7-min run: 0 min, 1% B; 5 min, 10% B; 
5.1 min, 100% B; 6 min, 100% B; 6.1 min, 1% B; 7 min, 1% B. The content of the 4 
nucleotides ATP, GTP, UTP and CTP was quantified in the electrospray negative ion 
mode with multiple reaction monitoring (MRM). Transitions of m/z 505.9 > 408 
and 505.9 > 272.9 were used for quantification and confirmation of ATP, respectively, 
and those of 521.9 > 158.9and 521.9 > 177 for GTP, 482.8 > 158.9 and 482.8 > 79 
for UTP, and 481.8 > 158.9 and 481.8 > 384 for CTP. Concentrations were deter- 
mined by using calibration curves of the 4 nucleotides. The linearity, exactitude 
and variability were determined for the technical validation of this assay. The linea- 
rity gave a correlation coefficient of the linear regression curves greater than 0.99 
for the 4 nucleotides. The minimum and maximum recovery of spiked samples with 
the 4 nucleotides at a concentration of 90 mg!” ' and 250 mg!‘ ranged from 72% to 
123%. The maximum intra- and inter-assay variability was 22% and 23%, respectively. 
Statistical analysis. P values were calculated with a Student’s t-test using PRISM 
software (GraphPad Software), with a two-tailed distribution. The variance was sim- 
ilar between the groups that have been statistically compared. 
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Extended Data Figure 1 | Identification of a genetic CTPS1 defect in 
patients P1.1, P1.2 and P2.1. a, Analysis of the single nucleotide variations 
(SNVs) detected by whole-exome sequencing in the genome of P1.1, P1.2 
and P2.1. The numbers of SNVs are indicated in the triangles. SNVs were 
filtered by removal of non-functional intronic and synonymous mutations, 
heterozygous variations and those present in dbSNPs, 1000 genomes databases. 
The intersection of the filtered SNVs in the three patients resulted in the 
identification of a single common splicing site variation in the CTPS1 gene. 
b, Exon-intron structure and sequences of exons 17, 18 and 19 of CTPS1. 
The position of the variation is indicated by an arrow. The boxed nucleotide 
corresponds to the alternative splice site which produces a shorter transcript 


Exon 17 


Intron 17-18 41,475,262 41,475,832 
Exon 18 


Intron 18-19 41,475,927 41,477,329 
Exon _19 


<4—Frequent variations removal (>0.1%) 


ee ae Non-functional mutations removal 


41,475,117 41,475,261 ATCATCCCTTTTITTGTTGGGGTTCAGTACCACCCTGAGTTCCTGTCCAGGCCTATCAAGC 
CCTCCCCACCATACTTTGGCCTCCTCCTGGCCTCTGTGGGGCGGCTCTCACATTACCTCC 


AGAAAGGCTGCAGGCTCTCACCCAG 
gtaggegcactctttgcettcagtaa.......... tgegtaaaccatctgaattctacag 


GGACACCTATAGTGACAGGAGTGGAAGCAGCTCCCCTGACTCTGAAATCACCGAACTGAA 
GTTTCCATCAATAAATCATGACTGATCTTGTAGC 


gtaagtggtactttaaagttttagt.......... coacttttttttttcttttaaacdy 


GGATGATTCTTCAAGAGACCGTTCAAACTTGGGTAGAGTTTACAGCTCTGACTTTACACT 
CGGCTTTGGAGACTTTCTTTAAATTATGTTTTTATTAAGATTATTTTATTATGCGGAAAG 
GTATTTGGGAAACTTGTCACTTIGCATGTCCCATCACGTGTACTGGCTCCTCTGTGGTGTC 
TGCCTGTTGCGTGACACTCTCCTTIGCAGTTCTTGAGTTGCGGCAGAACATCGCGATGGGA 
ACCGATGGTGGGTGGGGCTGCAGAGTGCCCCATCGGTCACCTTGTTTCTCAACTACCTCG 
CATCATTGCAGATGCTAGCGCGTTGCCTGTCGCTTTCCCTTGGATACCTAGACCGTTATA 
AAGTGTGCCACATGGACTTACCGAGCATGGAGAGAGGATTTTAGCTAGGATTTGAACACT 


41,475,833 41,475,926 


41,477,330 41,478,235 


_ Cie Pe Ch... PI 
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eer 
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lacking exon 18 detected in patient cells. The alternative stop codon is indicated 
by an asterisk. c, Expression of a CTPS1 transcript lacking exon 18 (CTPS1A18) 
in CTPS1-deficient patients. The relative expression of full length CTPS1, 
CTPS1A18 and actin transcripts was examined by qRT-PCR in EBV-B cell 
lines (patient P2.1) and T-cell blasts (patient P1.2) from CTPS1-deficient 
patients. gRT—PCRs of actin are shown as normalization controls of the cDNA 
samples. Three fold-serial dilutions of cDNAs (indicated as 1, 0.3 and 0.1) were 
used for amplification of each transcript. Base pair markers are shown on 

the left. PCR products were verified by sequencing showing the expression of 
an abnormal CTPS1 transcript lacking exon 18 in the cells of the patients. 
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Extended Data Figure 2 | Loss of CTPS1 expression and undetectable 
expression of the mutant CTPS1A18 protein in cells from CTPS1-deficient 
patients. a, Transient expression of CTPS1 and the mutant CTPS1A18 in 
293-T cells transfected with vectors containing wild-type CTPS1 or the mutant 
CTPS1A18. Cell lysates were tested by immunoblotting for CTPS1 with 
different antibodies raised against CTPS1 and for actin as a control for loading. 
The CTPS1A18 mutant protein is recognized by the rabbit polyclonal 
antibodies raised against the 341 to 355 (anti-341-355) or the 416 to 430 


anti-ACTIN 


(anti-416-430) residues of CTPS1 but not by the rabbit polyclonal antibody 
K21. b, T-cell blasts from a healthy control (Ctr.) and the CTPS1-deficient 
patient P1.2 (P1.2) stimulated for 48 h with anti-CD3 were analysed for 
CTPS1 expression with the rabbit polyclonal antibodies anti-416-430 and 
anti-341-355. Actin expression as control for loading. c, EBV B-cell lines from 
healthy controls (Ctr. 1 and Ctr.2) and CTPS1-mutated patients (P1.2 and 
P2.1) were analysed for CTPS1 expression with the rabbit polyclonal antibody 
anti-416-430. Actin expression served as control for loading. 
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Extended Data Figure 3 | Induction of CTPS1 expression in activated 

B cells. a, Immunoblots for CTPS1 expression in sorted CD19" B cells 
(from PBMCs ofan healthy donor) stimulated with the indicated stimuli. Actin 
was used as a loading control. b, Kinetics of CTPS1 mRNA expression 
monitored by qRT-PCR in sorted B cells that have been stimulated with 
anti-BCR+CpG. Expression is in arbitrary units (a.u.) normalized to the 


expression of the GADPH gene and leukocytes were used as calibrator. 

c, Immunoblots for CTPS1 expression in T-cell blasts (from an healthy donor) 
stimulated with anti-CD3/CD28 beads in the presence of selective inhibitors of 
NF«B, Src kinases, Ca”*, ERK kinase and PI3Kdelta. Actin was used as a 
loading control. The activity of the inhibitors was controlled in parallel (see 
Methods and data not shown). 
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Extended Data Figure 4| Analysis of proximal and late TCR activation 
responses in CTPS1-deficient cells. a, Immunoblots showing the 
phosphorylation of proximal signalling molecules in T-cell blasts from a 
control donor (Ctr.) and a CTPS1-deficient patient P1.2 (P1.2) stimulated with 
anti-CD3 antibodies for 0, 2, 5, 15, 30 and 60 min or PMA plus ionomycin 
(P +I). Cell lysates were immunoblotted with antibodies against tyrosine- 
phosphorylated residues (PY), phosphoPLCG1 (pPLCG1), PLCG1, NFAT2c, 
phosphoPKCtheta (pPKCtheta), IkBa, phosphoERK1/2 (pERK1/2) and actin 
as a loading control. Molecular weights are on the left. Data correspond to one 
representative experiment of 2 or 3 independent experiments. b, Flow 
cytometry analyses of Ca**-flux in T cells from PBMCs or T-cell blasts of a 
control donor (Ctr.) and a CTPS1-deficient patient P1.2 (P1.2) loaded with 
the Ca**-sensitive fluorescent dye Indo-1. Cells were then stimulated with 
anti-CD3 antibodies (first arrow) crosslinked with rabbit anti-mouse 
antibodies (second arrow) and then incubated with ionomycin (third arrow) to 
induce a receptor-independent Ca?* response. Intracellular Ca” levels are 
expressed in arbitrary units (a.u.). Data with the T-cell blasts correspond to one 
of three representative experiments. c, Analysis of the degranulation capacity of 
CD8* T-cell blasts from two control donors (Ctr.1 and Ctr.2) and a CTPS1- 
deficient patient P1.2 (P1.2) stimulated with the indicated concentrations of 
anti-CD3 antibodies for 4h. Cells were stained with antibodies against 
CD107a/b (LAMP1/2), a surface-exposed marker of the secretion of lytic 
granules, and then analysed by FACS. Means with s.d of percentages of CD8* 
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CD107* cells are presented. d, Flow cytometry analysis of intracellular IL-2 
production in CD4* and CD8* T cells from PBMCs of a control donor (Ctr.) 
and two CTPS1-deficient patients P1.2 and P2.2 (P1.2 and P2.2) stimulated 
for 36h with anti-CD28 and anti-CD3 antibodies. The percentages of 
CD4*IL-2* and CD8*IL-2* are shown. e, Flow cytometry analysis of 
intracellular IFN-y and TNF-o production on gated CD3+ T-cell blasts of a 
control donor (Ctr.) and a CTPS1-deficient patient P1.2 (P1.2) stimulated for 
12h with IL-2, anti-CD3 and anti-CD28 coated beads (anti-CD3/CD28), PMA 
plus ionomycin or PHA. Data are representative of one of 3 independent 
experiments. Dot-plots in red correspond to the isotype control. f, Induction of 
CD25 and CD69 in CD3* T-cell blasts from a control donor (Ctr.) and a 
CTPS1-deficient patient (P1.2) was assessed after 24 h of anti-CD3 stimulation 
for CD69 and 96h for CD25. Expression was assessed by flow cytometry 

and the median fluorescence intensity (MFI) is presented. Data are means with 
s.d of four and eight independent experiments for CD69 and CD25, 
respectively. Unpaired Student’s t-test. ***P < 0.001, *P < 0.05. g, Analysis of 
activation-induced cell death (AICD) in CD3~ T-cell blasts from a control 
donor (Ctr.) and a CTPS1-deficient patient P1.2 (P1.2,) after stimulation with 
the indicated concentration of anti-CD3 antibodies for 12h. Apoptotic cells 
were detected by annexin V and 7-AAD staining and the percentages of 
annexin V positive/7-AAD negative cells within the gated CD3 population are 
shown. Data are means with s.d. of four (P1.2, n = 4) and eight (Ctr., n = 8) 
independent experiments. Unpaired Student’s t-tests and *P < 0.05. 
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Extended Data Figure 5 | Decreased proliferation of TCR-stimulated T cells 
from patients P1.1, P1.2 and P2.2. and IL-2-expanded natural killer cells 
from patient P1.2. a, Proliferation of CD3°* T cells from PBMCs of control 
donors (Ctr.) and CTPS1-deficient patients (P1.1, P1.2; left panels) or (P1.2, 
P2.2; right panels). Right panels and left panels correspond to 2 independent 
experiments. Cells were stimulated with immobilized anti-CD3 and soluble 
anti-CD28 antibodies during the course of 6 days. The proliferation was 
determined by dilution of CFSE staining analysed by flow cytometry. 


Histograms correspond to CFSE staining dilutions for which the number of cell 
divisions was indicated at the top of each peak. b, Proliferation of CD3* T and 
CD16* CD56" natural killer cells from PBMCs of a control donor (Ctr.) anda 
CTPS1-deficient patient (P1.2). Cells were stimulated with anti-CD3/CD28 
coated beads for 3 days or IL-2 for 7 days. Representative dot plots showing 
cell divisions by dilution of the violet dye and expression of the activation 
marker CD69. Inserts with histograms showing the violet dye dilution with the 
number cell divisions indicated at the top of each peak. 
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Extended Data Figure 6 | Decreased incorporation of thymidine, uridine, 
cytidine, leucine and asparte in CTPS1-deficient cells. a, Incorporation of 
[’H]thymidine, [H]uridine, [PH] cytidine and [*H]leucine as tracers of DNA, 
RNA and protein synthesis in PBMCs from a control healthy donor (Ctr.) anda 
CTPS1-deficient patient (P1.2) stimulated or not (no stim.) for 3 days with 
anti-CD3 or PHA and for 6 days with tetanus toxoid, candidin or tuberculin. 
The concentration of [’H] cytidine used in these experiments is under the value 


allowing the restoration of normal proliferation in CTPS1-deficient cells 
(also see Methods). Data are means with s.d of two independent experiments 
with triplicates. Unpaired Student’s t-tests and ***P < 0.001, **P < 0.01, 

*P < 0.05. b, ¢, Incorporation of ['4C]aspartate and [H]thymidine in PBMCs 
(b) or T-cell blasts (c) from a control healthy donor (Ctr.) and a CTPS1- 
deficient patient (P1.2) stimulated or not (no stim.) for 3 days with anti-CD3 or 
PHA. Data means with s.d of three independent samples. 
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Extended Data Figure 7 | Decreased proliferation of Jurkat cells following _ expression. b, Proliferation of Jurkat cells, in which CTPS1 expression was 
shRNA-mediated CTPS1 downregulation. a, Expression of CTPS1 in Jurkat _ silenced, was monitored as a function of the loss of GFP expression and 

cells transduced with lentiviral vectors containing two distinct CTPS1 shRNAs compared with cells transduced with the scrambled shRNA. The percentages of 
(Sh CTPS1 #1 or Sh CTPS1 #2) or a scrambled shRNA (Sh scramble). Cell GFP-positive cells were determined by flow cytometry at the indicated time 
lysates were analysed by immunoblotting for CTPS1, CTPS2 and actin protein _ points with ‘time 0’ corresponding to 48 h post-transduction. 
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Extended Data Figure 8 | Measurements of nucleotide pools in 
CTPS1-deficient T cells and in B/EBV cell lines from patients. 

a, Concentration of ATP, GTP, UTP and CTP in cell extracts of T-cell blasts 
stimulated with anti-CD3/CD28 coated beads or not (no stim.) from control 
healthy donors (Ctr.) or from patient P1.2. Control cells were treated or not 
with deazauridine for 24h before and during stimulation or not. Representative 
data from 3 independent experiments. b, Same as in a with EBV B-cell lines 
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from control healthy donors (Ctr.) and patients P1.1 (squares), P1.2 (circles) 
and P2.1 (triangles). For controls, each symbol corresponds to a different 
control cell line (from a different healthy donor). Representative data from two 
independent experiments with blinding during the measurements. Bars 
correspond to averages. Unpaired Student’s t-tests and *P < 0.05, **P < 0.01, 
***P <0).001. CTP data are also shown in Fig. 3g, h. 
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Extended Data Table 1 


Immunological data of CTPS1-deficient patients 


Patient P1.1 
Age (months) 38 
Cell subsets (cells.mm*) 

PMN (1800-8000) 2300 
Lymphocytes (1700-6900) 3500 
CD3* (900-4500) 4390 
CD4* (500-2400) 1410 
CD8* (300-1600) 2580 
CD19+ (200-2100) 1010 
CD56+ (100-1000) 

T-cell proliferation 

PHA low 
Serum immunoglobulins (g.L"4) 

IgG in 
IgG2 os 
IgA eam 
ipo ce ae 
Specific antibodies ([U.ml') 

Tetanus (> 0.1) 0.16 
H. influenzae type B (>1) 7.0 

S. Pneumoniae (> 0.35) low 


PMN, polymorphonuclear neutrophils. PHA, phytohaemagglutinin. Ig, immunoglobulin. 


P12 


113 


1100 


3400 


1680 


680 


910 


540 


480 


normal 


29:5 


(8.3-14.3) 


172 
(1-1.9) 


0.37 


(0.7-1.3) 


P2.1 


110 


400 
280 
160 
120 
60 


30 


10.6 
(8.3-14.3) 


0.23 
(4) 


0.74 
(1-1.9) 


0.72 
(0.7-1.3) 


0.48 
0.74 


protective 


P2.2 


48 


4500 


1600 


990 


570 


480 


170 


120 


normal 


5.53 


(6.8-11.8) (6.8-11.8) 


0.2 
(4) 


0.42 
(0.7-1.3) 


0.42 
(0.5-1.1) 


0.05 
05 


low 


P3.1* 


60 


1100 
200 
80 
43 
28 
12 


20 


12.8 


8.73 
(0.7-1.3) 


0.76 
(0.5-1.1) 


1.44 
0.86 


low 


In bold correspond to values below normal age-matched ranges that are indicated in brackets. 
*when acutely unwell with VZV. 
**at presentation with EBV-driven LPD 


P3.2* P4 
20 24 
6400 
2400 4000 
1461 2160 
1095 1320 
346 880 
297 760 
128 
24:5 17.9 
(5.3-10.1)  (6.8-11.8) 
1 1.5 
(0.3-0.8) (0.7-1.3) 
1.4 1.2 
(0.5-1.1) (0.5-1.1) 
0.03 
low 


P5** 


60 


450 
440 
340 
314 
35 
13 


16 


normal 


48 
(4.2-8) 


44 
(0.2-0.7) 


0.8 
(0.5-1.1) 


0.57 
0.06 


low 


Different immunological parameters of patients were tested from blood (numbers of cells), PBMCs (proliferation in response to PHA evaluated by incorporation of [?H]thymidine) and serum (immunoglobulin 


subclasses and specific antibodies). 
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Extended Data Table 2 | Immunological features of PBMCs from patient P1.2 at 8 and 9 years 


Lymphocytes (cells.mm) 
T cells 
CD3+ (cells.mm-3) 
CD4+ (cells.mmr4) 
CD8* (cells.mm-3) 
CD4/CD8 ratio 
TCRa/B (%) 
TCRy/S (%) 
CD31*CD45RA+* / CD4* (recent naive thymic emigrant) (%) 


CD45RO?* / CD4* (memory) (%) 
(cells.mm 3) 


CCR7*CD45RA‘ / CD8* (naive) (%) 
CCR7*CD45RA: / CD8* (central memory) (%) 


CCR7-CD27-CD45RA- / CD8* (effector memory) (%) 
(cells.mm 3) 


CCR7*CD27-CD45RA*/ CD8* (exhausted effector memory - EMRA) (%) 
CD127"“CD25t / CD4+ (regulatory) (%) 
Va7*CD161* / CD3* (MAIT) (%) 
Va24+VB11*CD161* / CD3* (NKT) (%) 
T-cell proliferation (cpm.10°) 
PHA (6.25 mg ml!) 
OKT3 (50 ng ml!) 
Candidine 
Tetanus toxoid 
Tuberculin 
NK cells 
CD16*CD56+ (cells.mm*) 
CD16*CD56+ (%) 
B cells 
CD19* (cells.mm*) 
CD19* (%) 
CD21*CD27*/ CD19* (memory) (%) 
IgD*IgM* / CD19*CD21*CD27* (marginal zone) (%) 


IgD IgM / CD19* CD21*CD27* (switched) (%) 


Normal values 
(age-matched) 


(1900-3700) 


(1200-2600) 
(650-1500) 
(370-1100) 

(0.9-2.6) 
(26-85) 
(0.2-14) 
(43-55) 


(13/30) 
(85/450) 


(52/68) 
(3-4) 


(11/20) 
(42/220) 


(1-18) 
(2-8) 
(1-8) 


(>0.02) 


(>50) 
(>30) 
(>10) 
(>10) 


(>10) 


(100-480) 


(4-17) 


(270-860) 
(13-27) 
(11-24) 
(31-51) 


(21-49) 


8 years 


1700 


1207 
527 


612 


3.8 


3.92 
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9 years 


1800 


1188 


Seb 


1,2 


162 


378 


23.2 


Different T-cell subsets, B-cell subsets and natural killer cells from PBMCs were tested by flow cytometry. T cell proliferation from PBMCs in response to different stimuli including antigen- specific responses was 


analysed. 
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A Ctf4 trimer couples the CMG helicase to DNA 
polymerase a in the eukaryotic replisome 


Aline C. Simon", Jin C. Zhou?*, Rajika L. Perera't, Frederick van Deursen’®, Cecile Evrin*, Marina E. Ivanova‘, Mairi L. Kilkenny’, 
Ludovic Renault, Svend Kjaer®, Dijana Matak-Vinkovié°, Karim Labib*, Alessandro Costa” & Luca Pellegrini’ 


Efficient duplication of the genome requires the concerted action of 
helicase and DNA polymerases at replication forks' to avoid stalling 
of the replication machinery and consequent genomic instability” *. 
In eukaryotes, the physical coupling between helicase and DNA poly- 
merases remains poorly understood. Here we define the molecular 
mechanism by which the yeast Ctf4 protein links the Cdc45-MCM- 
GINS (CMG) DNA helicase to DNA polymerase a (Pol @) within the 
replisome. We use X-ray crystallography and electron microscopy to 
show that Ctf4 self-associates in a constitutive disk-shaped trimer. Trime- 
rization depends on a f-propeller domain in the carboxy-terminal 
half of the protein, which is fused to a helical extension that pro- 
trudes from one face of the trimeric disk. Critically, Pol a and the 
CMG helicase share a common mechanism of interaction with Ctf4. 
We show that the amino-terminal tails of the catalytic subunit of Pol 
o and the Sld5 subunit of GINS contain a conserved Ctf4-binding 
motif that docks onto the exposed helical extension of a Ctf4 pro- 
tomer within the trimer. Accordingly, we demonstrate that one Ctf4 
trimer can support binding of up to three partner proteins, including 
the simultaneous association with both Pol a and GINS. Our findings 
indicate that Ctf4 can couple two molecules of Pol « to one CMG heli- 
case within the replisome, providing a new model for lagging-strand 
synthesis in eukaryotes that resembles the emerging model for the 
simpler replisome of Escherichia coli*. The ability of Ctf4 to act as 
a platform for multivalent interactions illustrates a mechanism for 
the concurrent recruitment of factors that act together at the fork. 

Recent evidence indicates that the leading- and lagging-strand poly- 
merases are anchored to the helicase by replisome components that lack 
counterparts in bacteria”. The yeast Ctf4 protein is among the best 
characterized of these factors: its role is to link the CMG helicase with 
Pola, the polymerase subunit of the Pol x—-primase complex that initi- 
ates Okazaki fragments during lagging-strand synthesis””®. Ctf4 is part 
ofa conserved family of replication factors that includes human AND-1 
and fission yeast Mcl1, and is required for efficient DNA synthesis, nor- 
mal cell-cycle progression and genomic stability'* *. In addition to their 
role in DNA replication, Ctf4 and AND-1 perform an important yet 
poorly understood function in sister chromatid cohesion’?”. 

Earlier work had shown that Ctf4 binds directly to the GINS subunit 
of the CMG helicase and to the catalytic subunit of Pol «, via the C- 
terminal half of the protein that does not include an annotated WD40 
domain in the N terminus of Ctf4 (ref. 10). We identified by bioinfor- 
matic analysis a second WD40 domain in the C-terminal half of yeast 
Ctf4, juxtaposed to a predicted helical region. Crystallographic analysis 
of residues 471-927 (C-end; 457 amino acids) of yeast Ctf4 (Ctf4(CTD)) 
confirmed the presence of a six-bladed B-propeller domain fused to a 
helical bundle of six a-helices arranged in a stack of helical hairpins 
(Extended Data Table 1 and Extended Data Fig. 1). Notably, the struc- 
tural analysis revealed a trimeric assembly of Ctf4 molecules, resulting 
from side-on packing of B-propeller domains (Fig. 1a). The homotypic 


association of the B-propeller domains generates a discoidal shape with 
three-fold symmetry. The helical domains of each Ctf4 protomer extend 
upwards and away from the plane of the trimer, like legs of a three-legged 
stool. The trimeric assembly seems to be constitutive as it buries a total 
surface area of 8,100 A? an average of 2,700 A? per interface. The exis- 
tence of Ctf4 as a constitutive trimer mediated by self-association of its 
C-terminal domain was confirmed by single-particle electron micro- 
scopy (EM) analysis, which showed the presence of three-fold symmetry 
in particles of full-length Ctf4 and Ctf4(CTD) (Fig. 1b, cand Extended 
Data Fig. 2). The EM analysis of full-length Ctf4 further revealed that 
the N-terminal WD40 domains depart radially from the Ctf4(CTD) tri- 
mer, to which they are loosely connected (Fig. 1b). The presence of Ctf4 
as a stable trimer in solution was demonstrated by multi-angle laser scat- 
tering (MALS) of Ctf4(CTD) and full-length Ctf4 (Fig. 1d and Extended 
Data Fig. 2) and non-denaturing nano-electrospray ionization mass spec- 
trometry (native mass spectrometry) of Ctf4(CTD) (Fig. le). 

We had previously shown that Ctf4 binds to the N-terminal portion 
of Poll, the yeast orthologue of Pol « (ref. 10). By progressive trunca- 
tions of this largely unstructured region of Poll, we identified a short 
linear motif spanning residues 137-149 that is necessary and sufficient 
for the association with Ctf4 in vitro. The motif has a mixed acidic and 
hydrophobic nature and is conserved from yeast to humans (Fig. 2b). 
Alanine scanning mutagenesis of the motif revealed that conserved res- 
idues F140, D142, 1143, L144 and F147 are essential for the interaction 
with Ctf4 (Fig. 2c). The results of the biochemical experiments with 
recombinant proteins were confirmed by immunoprecipitation of Poll 
from extracts of yeast cells that were synchronized in G1 phase before 
release into S phase. Whereas wild-type Poll associated with Ctf4 and thus 
with the components of the Cdc45-MCM-GINS complex (Fig. 2d, con- 
trol), the Poll-A allele with alanine substitutions at D141, D142, L144 
and F147 was unable to interact with either Ctf4 or the CMG (Fig. 2d, 
poll-A). 

To define the structural basis for the interaction between Ctf4 and Pol 
a, we soaked a 13-amino-acid peptide corresponding to Poll sequence 
137-IDNFDDILGEFES- 149 in the Ctf4(CTD) crystals. For the soaking 
experiments, we used a different crystal form of the Ctf4(CTD) trimer 
that is easier to grow; this form captures a topologically open confor- 
mation of the Ctf4(CTD) trimer resembling a cracked ring (Extended 
Data Fig. 3 and Supplementary Video 1). No differences are observed in 
protomer structure between closed and open forms of the Ctf4 trimer, 
with the exception of the helical domain located at the gap in the open 
form that becomes disordered in the electron density map. The poten- 
tial functional significance of the open and closed forms of the Ctf4 tri- 
mer and their interconversion in solution is presently unclear. 

The crystal structure of the Ctf4(CTD)-Pol « complex shows that 
the helical domain protruding from the discoidal trimer is responsible 
for binding the polymerase (Fig. 2e). In the structure, amino acids 140- 
FDDILGEFES-149 of Poll fold into a two-turn o-helix that packs in 
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Figure 1 | Architecture of yeast Ctf4. a, Ctf4 self-associates in a trimer of 
novel design. The panel shows top and side views of the crystal structure of the 
C-terminal region of yeast Ctf4 (Ctf4(CTD); amino acids 471-927). The 
protein is drawn as ribbon representation, coloured according to its domain 
structure: the §-propeller domain is in light blue and the helical domain in 
yellow. Above the drawing, a bar diagram shows the domain structure of 
full-length yeast Ctf4 and the extent of the region crystallized in our study. 

b, Analysis of full-length Ctf4 by single-particle electron microscopy. 
Multivariate statistical symmetry analysis detects a three-fold symmetry 
component for the full-length Ctf4 particle. Reference-free class averages of 
full-length Ctf4 reveal a core structure flexibly linked to up to three satellite 
domains. c, Analysis of Ctf4(CTD) by single-particle electron microscopy. The 
C-terminal domain of Ctf4 maintains a trimeric structure, as shown by 
multivariate statistical symmetry analysis and reference-free class averages. 

d, Size exclusion chromatography-multi-angle laser scattering analysis of yeast 
Ctf4(CTD). The light scattering is plotted alongside the fitted molecular 
masses. The protein eluted in a single peak, corresponding to a measured 
molecular mass of 161.1 kDa. The predicted molecular mass for the trimeric 
species is 163.1 kDa. e, Native mass-spectrometry analysis of yeast Ctf4(CTD). 
The measured molecular mass of 163,195 Da matches closely the predicted 
molecular mass of 163,148 Da for a trimeric species. 


antiparallel fashion against helices «3 and «5, on the outward-facing side 
of the helical domain of one Ctf4 protomer. The Ctf4-binding motif of 
Poll occupies each of the two binding sites available in this crystal form 
of the Ctf4(CTD) trimer. The interaction with the Poll peptide does not 
induce an appreciable conformational change in the helical domain of 
Ctf4(CTD) nor does it alter its position in the trimeric structure. The side 
chains of Poll residues F140, 1143, L144 and F147, which were critical for 
the interaction in the pull-down assay, become buried at the interface and 
pack against a continuous hydrophobic surface formed by Ctf4 residues 
L867, A871, A894, A897 and 1901 (Fig. 2f). The interaction is augmented 
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Figure 2 | Pol a contains a Ctf4-interacting motif that binds to the helical 
domain of Ctf4. a, Identification of the Ctf4-binding motif of Poll, the yeast 
orthologue of Pol «. GST-tagged constructs spanning progressively smaller 
N-terminal regions of Poll were tested for interaction with Ctf4(CTD) in 
pull-down experiments on glutathione sepharose beads. The top panel shows 
the boundaries of the GST-Poll1 constructs; the bottom panel shows the result 
of the pull-down experiments, analysed by SDS-PAGE. The last lane on the 
right-hand side of the gel contains only Ctf4(CTD). The position of the 
Ctf4(CTD) band in the pull-down experiments is highlighted by a box. The 
asterisk marks the position of GST-Pol1 (121-348), which overlaps partially 
with Ctf4(CTD). The first lane contains the protein size markers, with their 
molecular masses reported in kilodaltons on the left-hand side of the gel. 

b, Multiple sequence alignment of the Ctf4-binding motif of yeast Poll 

(Sc; Saccharomyces cerevisiae) with Pol « sequences from Schizosaccharomyces 
pombe (Sp), Danio rerio (Dr), Drosophila melanogaster (Dm) and Homo 
sapiens (Hs). Invariant residues are highlighted in green, identical residues in 
yellow and similar residues in cyan. The asterisk marks amino acids that are 
essential for interaction with Ctf4(CTD) (see panel c). c, Alanine-scanning 
mutagenesis of the Ctf4-binding motif. Poll residues 137-149 were fused to 
GST and each amino acid between 140 and 149 (except G145) was mutated to 
alanine. The effect of each single-point mutation on the interaction with 
Ctf4(CTD) was tested by GST pull-down and analysed by SDS-PAGE. WT, 
wild type. The first lane contains the protein size markers, with their molecular 
masses reported in kilodaltons on the left-hand side of the gel. d, The budding 
yeast strains POLI-9MYC (control) and poll-A-9MYC (poll-A, containing 
the D141A, D142A, L144A and F147A mutations in the endogenous POL1 
locus) were grown at 24°C, arrested in G1 phase and released into S phase 
for 30 min. The Myc-tagged proteins were isolated from cell extracts by 
immunoprecipitation on anti-Myc beads and the indicated proteins were 
detected by immunoblotting with the corresponding antibodies”’. e, Co-crystal 
structure of Ctf4(CTD) bound to a peptide corresponding to the Ctf4-binding 
motif of Pol «. Ctf4 is drawn as in Fig. la, the Ctf4-binding motif of Pol « is 
drawn as a green ribbon. f, Detailed view of the interaction between the 
Ctf4-binding motif of Pol « (green tube) and the helical domain of Ctf4 
(yellow ribbon). The side chains of Poll residues F140, D142, 1143, L144, F147 
and Ctf4 residue R904 are shown as stick representation. 


by polar contacts on the perimeter of the hydrophobic interface, between 
acidic residues D141, D142, E146 and E148 of Poll and basic residues 
K864, R868, R893, K900 and R904 of Ctf4 (Extended Data Figs 4a and 5). 
The salt bridge between the conserved residues D142 of Poll and R904 
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in helix «5 of Ctf4 seems to be particularly important for binding, as 
the D142A mutation abolishes the interaction (Fig. 2c). 

We next set out to investigate the mode of Ctf4 interaction with the 
CMG helicase. The association between Ctf4 and GINS is sufficiently 
strong to be assayed by size-exclusion chromatography’° (Fig. 3a). We 
found that the interaction is dependent on the unstructured tail at the 
N terminus of the Sld5 subunit of GINS (Fig. 3a). Sequence compari- 
son of fungal Sld5 orthologues revealed a conserved pattern of amino 
acids that is highly similar to the Ctf4-binding motif of Poll (Fig. 3b). 
Further dissection of the N-terminal tail of Sld5 confirmed that the initial 
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Figure 3 | The Sld5 subunit of yeast GINS shares a common mechanism of 
Ctf4 binding with Pol a. a, Analysis of the Ctf4—GINS interaction by gel 
filtration chromatography, using Ctf4(CTD) and versions of GINS that contain 
either full-length (top panel) or N-terminally truncated Sld5 (SId5AN; bottom 
panel). b, Multiple sequence alignment of the N terminus of fungal Sld5 
sequences (Sc, Saccharomyces cerevisiae; Ag, Ashbya gossypii; An, Aspergillus 
niger; Gz, Gibberella zeae; Ca, Candida albicans). Invariant residues are 
highlighted in green, identical residues in yellow and similar residues in cyan. 
The Ctf4-binding motif of yeast Poll is reported below the alignment. 

c, Mapping of the Ctf4-binding sequence in the N terminus of Sld5 by GST 
pull-down analysis. The top panel shows the boundaries of the GST-Sld5 
constructs tested for interaction with Ctf4(CTD); the bottom panel shows the 
results of the pull-down experiments, analysed by SDS-PAGE. The band 
marked with an asterisk corresponds to free GST. The first lane contains the 
protein size markers, with their molecular masses reported in kilodaltons on 
the left-hand side of the gel. d, Alanine-scanning mutagenesis of the 
Ctf4-binding motif. Residues 1-20 of yeast Sld5 were fused to GST and each 
position between 3 and 13 (except A10) was mutated to alanine. The effect of 
each single-point mutation on the interaction with Ctf4(CTD) was tested by 
GST pull-down and analysed by SDS-PAGE. WT, wild type. The first lane 
contains the protein size markers, with their molecular masses reported in 
kilodaltons on the left-hand side of the gel. e, Co-crystal structure of Ctf4(CTD) 
bound to a peptide corresponding to the Ctf4-binding motif of Sld5. Ctf4 is 
drawn as in Fig. 1a, the Ctf4-binding motif of Sld5 is drawn as red ribbon. 

f, Detailed view of the interaction between the Ctf4-binding motif of Sld5 
(red tube) and the helical domain of Ctf4 (yellow ribbon). The side chains 

of Sld5 residues 13, 15, D7, 18, L9, L12 and Ctf4 residue R904 are shown in 
stick representation. 
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18 residues, containing the Ctf4-binding motif, were essential for the 
interaction (Fig. 3c). Our identification of a Ctf4-binding site in Sld5 is 
in agreement with earlier reports”"°, which had identified Sld5 as a Ctf4- 
binding subunit of GINS. Whether a conserved Ctf4-binding motif is 
present in the Sld5 sequence of higher eukaryotes is presently unclear. 
Alanine scanning mutagenesis of the Ctf4-binding motif of Sld5 high- 
lighted the importance of the same pattern of conserved hydrophobic 
residues that were essential for Poll binding to Ctf4. Alanine mutation 
of 15, 18 and L9, corresponding to residues F140, 1143 and L144 of Poll, 
abolished the association of Sld5 with Ctf4, whereas mutation of L12, 
equivalent to Poll F147, weakened the interaction (Fig. 3d). 

To ascertain whether the similarity in the two Ctf4-binding motifs 
extended to their mechanism of interaction, we soaked the Ctf4(CTD) 
crystals with the peptide MDINIDDILAELDKETTAYV, corresponding 
toamino acids 1-19 of yeast Sld5. The crystallographic analysis revealed 
a near-identical mode of interaction between Sld5 and Ctf4 as previ- 
ously observed for the Ctf4-binding motif of Poll (Figs 3e and 4a and 
Extended Data Figs 4b and 5). The Ctf4-binding sequence of Sld5 includes 
an additional hydrophobic contact of 13 with Ctf4 residues A871, C874 
and 1901, which might account for its tighter (~5-fold) association with 
Ctf4 compared to Poll (Fig. 3fand Extended Data Fig. 6a). Conversely, 
the contribution of polar contacts seems diminished, as disruption of the 
salt bridge between D7 in Sld5 and R904 in Ctf4 does not impair appre- 
ciably the interaction (Fig. 3d, f). Collectively, these findings establish 
that Poll and GINS contain a Ctf4-binding motif that is conserved in 
sequence and function. 

In yeast cells, Ctf4 appears to associate more tightly with the CMG 
helicase than with Pol «, as the association of Ctf4 with CMG resists buffers 
containing 700 mM salt, whereas the Ctf4-dependent association of Pol 
with the replisome is lost at 300 mM salt”. Consistent with this, Ctf4 
remains associated with CMG in cells containing mutations that dis- 
rupt the Ctf4-binding site of Sld5 (Extended Data Fig. 6b). It thus seems 
likely that Ctf4 has a more complex interaction with the CMG than with 
Pol « and that additional contacts between Ctf4 and CMG remain to 
be characterized. 

Our findings predict that Ctf4 can support simultaneous interactions 
of varying stoichiometry and with multiple partners (Fig. 4b). Indeed, 
native mass-spectrometry analysis of Ctf4(CTD) in the presence of the 
Ctf4-binding sequences of Pol « and Sld5 showed reconstitution of com- 
plexes with 1:1, 1:2 and 1:3 Ctf4-to-peptide stoichiometries (Fig. 4c). To 
determine whether a Ctf4 trimer could support concomitant binding of 
three partner molecules, we analysed by EM reconstituted Ctf4(CTD)- 
GINS complexes (Extended Data Fig. 7). In agreement with the mass- 
spectrometry data, the EM analysis demonstrated the presence of 1, 2 
or 3 copies of GINS bound to one Ctf4(CTD) trimer (Fig. 4d and Sup- 
plementary Video 2), each arranged radially around the Ctf4(CTD) trimer. 
Interestingly, each GINS molecule occupies a fixed position relative to 
the Ctf4 trimer, indicating that the interface between Ctf4 and GINS 
extends beyond the contact provided by the flexible N-terminal tail of 
Sld5. The Ctf4—GINS interface discernible in our EM averages is prob- 
ably important to sustain the association between Ctf4 and the CMG 
helicase in the replisome. 

The reported function of Ctf4 as a physical link between helicase and 
polymerase prompted us to determine whether GINS and Pol «can simul- 
taneously associate with the Ctf4 trimer. We visualized by EM recon- 
stituted hetero-assemblies of Ctf4(CTD) bound concurrently to GINS 
and the N-terminal region of Poll (residues 1-351) fused to protein A 
(Poll(NTD)) (Extended Data Figs 8 and 9). As predicted by the trimeric 
nature of Ctf4, we could detect Ctf4(CTD)-—GINS-Poll(NTD) complexes 
of varied stoichiometries, with partial or full occupancy of the Ctf4 trimer 
(Fig. 4e). These data establish a structural basis for Ctf4 as the bridging 
factor between the CMG helicase and DNA polymerase « in eukaryotic 
replication. 

Inside the cell, the appropriate stoichiometry will presumably be deter- 
mined by the constraints imposed on replisome assembly during repli- 
cation initiation. Within the replisome, one binding site of the Ctf4 trimer 
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Figure 4 | The Ctf4 trimer coordinates the recruitment of replication 
factors to the fork. a, Superposition of the structures of Ctf4(CTD) bound to 
the Ctf4-binding motif of Pol « (green tube) or Sld5 (red tube). Ctf4(CTD) 

is displayed as molecular surface, in light brown. b, Ctf4(CTD) can associate 
in principle with up to three partner proteins. To illustrate this point, the 
Ctf4-binding motif of Pol « was modelled in each of the three binding sites of 
Ctf4(CTD). The helical Ctf4-binding motif is shown as a white cylinder, and 
Ctf4(CTD) is drawn as a molecular surface, in light blue. c, Native mass 
spectrometry analysis of the Ctf4(CTD) trimer in the presence of peptides 
corresponding to the Ctf4-binding motifs of Pol « (top) and Sld5 (bottom). 
d, Single-particle electron microscopy analysis of the interaction of GINS with 
the Ctf4(CTD) trimer. Reference-free class averages of Ctf4(CTD) bound to 
one (top row), two (middle row) or three copies (bottom row) of GINS are 
shown. e, Reference-free class averages of the Ctf4(CTD)- Poll(NTD) 

(top row), Ctf4(CTD)-Poll(NTD)-GINS (middle row) and Ctf4(CTD)- 
Poll(NTD)-(GINS), (bottom row) heteroassemblies. f, The panel shows the 
crystal structure of human GINS (ref. 25) docked into the electron microscopy 
reconstruction of the CMG helicase (ref. 24). The Sld5 subunit of GINS is 
coloured orange and the rest of GINS is shown in white. The density for the 
MCM and Cdc45 subunits of the CMG helicase is shown as a grey surface, 
whereas the density of the GINS tetramer is shown as an outline. The position 
of MCM2, MCM3, MCM5 and Cdc45, which surround GINS in the helicase 
complex, is indicated. An arrow marks the N-terminal residue in the Sld5 
structure. g, A model of Ctf4 function at the replication fork, as the physical 
bridge between the CMG helicase and the DNA polymerase «/primase 
complex. The additional contacts between Ctf4 and GINS suggested by the EM 
analysis (panel d) are indicated by dashed lines. 


is likely to engage in a constitutive interaction with GINS, to anchor Ctf4 
to the CMG helicase at the fork. In the molecular model of the CMG”, 
the GINS structure” has the Sld5 N terminus favourably positioned for 
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binding Ctf4, in agreement with our biochemical findings (Fig. 4f). As 
replisomes formed at a replication origin need not remain physically 
tethered for efficient replication”, it is unlikely that Ctf4 acts by coup- 
ling two CMG helicases. The other two binding sites of the CMG-bound 
Ctf4 trimer would remain available for interaction with Pol «, indicat- 
ing that two copies of the Pol x-primase complex might work together 
during lagging-strand synthesis (Fig. 4g). Such coupling of helicase and 
polymerase in the eukaryotic replisome would be functionally analog- 
ous to the emerging model of the E. coli replisome, where two DNA poly- 
merases cooperate in lagging-strand synthesis to increase processivity 
and efficiency of nucleotide polymerization’. 

In addition to its function as a helicase-polymerase bridge, Ctf4 seems 
to be ideally suited to fulfil a wider role in replication, as a platform for 
coordinating the activity of replication factors at the fork. In this model, 
one Ctf4 protomer would keep the trimer constitutively anchored to the 
CMG, whereas other replisome components, including Pol «, would 
engage with the helicase in a dynamic interaction mediated by the Ctf4- 
binding motif identified here. We note that this model of Ctf4 function 
is reminiscent of the way the proliferating cell nuclear antigen (PCNA) 
interacts with replication factors such as Fen] and DNA ligase I (ref. 27). 
Thus, in addition to bridging CMG helicase and Pol «, Ctf4 might recruit 
to the fork other factors required for efficient replication under normal 
conditions or needed to deal with exceptional situations during repli- 
cative stress. 


METHODS SUMMARY 


Included in the Methods section are full experimental procedures for the crystallo- 
graphic and electron microscopy analysis of Ctf4 and its complexes with Pol « and 
GINS, the biophysical characterization of Ctf4 by multi-angle laser scattering and 
native mass spectrometry, and the in vitro and ex vivo biochemical analysis of the 
interaction of Ctf4 with Pol « and Sld5. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

DNA constructs for X-ray crystallography, MALS and MS of Ctf4(CTD) and 
biochemical analysis of the Ctf4(CTD)-Poll and Ctf4(CTD)-Sld5 interactions. 
Fold recognition analysis in Phyre2 (ref. 28) predicted that the C-terminal half of 
yeast Ctf4, responsible for interactions with GINS and Pol «, contained a WD40 
domain fused to an helical region. A region of yeast Ctf4 comprising amino acids 
471-927 (natural C-end; Ctf4(CTD)) was PCR amplified from S. cerevisiae genomic 
DNA and cloned into a bacterial pRSFDuet-1 T7 expression plasmid (Novagen) 
via unique BamHI and AvrlI sites. Using PCR primer extension, a TEV protease 
site was introduced at the start of the Ctf4(CTD) open reading frame sequence and 
after the N-terminal Hisg-affinity tag encoded by the pRSFDuet-1 vector. 

The DNA polymerase & (Pol )- and SId5-GST fusion constructs used in pull- 
down experiments were generated by insertion of the appropriate nucleotide sequence 
into the Ncol and Xhol sites of the pGAT2 T7 expression plasmid encoding a thrombin- 
cleavable N-terminal GST fusion protein’. 

A construct for bacterial expression of yeast GINS was prepared starting from 
vector pKL653 (ref. 10), by subcloning one expression cassette comprising psf3 
and psf14C (amino acids 1-164) into the Ncol and Not! sites in the first MCS ofa 
pRSFDuet-1 expression plasmid, and another expression cassette comprising psf2 
with an N-terminal His, affinity tag and sld5 into the second MCS of pRSFDuet-1, 
resulting in the polycistronic pGINS-Duet-1 expression plasmid. The GINS(SId5AN) 
construct used for analytical gel filtration experiments was derived from the pGINS- 
Duet-1 vector, by replacing the second expression cassette with a modified cassette 
that encodes, in addition to His¢-psf2, a version of sld5 coding for a truncated pro- 
tein lacking the first 48 amino acids at its N terminus. 

DNA constructs for electron microscopy and MALS of full-length Ctf4. Full- 
length S. cerevisiae Ctf4 and Ctf4 N-terminal deletion (Ctf4(CTD), amino acids 461- 
927) constructs were both cloned into the pET28c vector (Novagen) to express an 
N-terminal His, affinity tag. The S. cerevisiae GINS Psf1 C-terminal deletion (ACT, 
amino acids 1-164) construct was subcloned from a previously described GINS 
operon-containing plasmid’? into the pET28c vector and carries an N-terminal Strep 
II tag in the GINS Psf3 subunit. The Poll-protein A fusion was subcloned into the 
pET Strep II-TEV LIC vector (QB3 MacroLab) by ligation independent cloning”. 
This construct contains in the following order: an N-terminal Strep II tag, the 
N-terminal domain (amino acids 1-351) of S. cerevisiae Poll, the protein A region 
of the TAP tag”’ and a C-terminal His, affinity tag. 

Protein expression and purification for X-ray crystallography, MALS and MS 
of the Ctf4(CTD) and biochemical analysis of the Ctf4(CTD)-Poll and Ctf4 
(CTD) -SId5 interactions. Ctf4(CTD) was overexpressed in E. coli strain BL21 
(DE3)Rosetta2 with IPTG induction and overnight expression at 20 °C in LB medium. 
After overexpression, 41 of cells were collected and re-suspended in 50 mM Tris 
pH 7.0, 500 mM NaCl, 10% (w/v) glycerol, 1 mM DTT and protease inhibitors (Sigma). 
Cells were lysed via sonication, the crude extract was clarified by centrifugation and 
the supernatant was applied to a 4-ml column of nickel agarose resin (Sigma) using 
gravity flow. The column with bound Ctf4(CTD) was washed in buffer supple- 
mented with 20 mM imidazole and Ctf4(CTD) elution was performed with buffer 
supplemented with 200 mM imidazole. Eluted Ctf4(CTD) was further purified by 
gel filtration chromatography over a Superdex 200 16/60 HiLoad column (GE Health- 
care) in 25 mM HEPES pH 7.0, 200 mM NaCl and 10% (w/v) glycerol and peak 
fractions were pooled, concentrated to 10 mg ml~ ! flash frozen in liquid nitrogen 
and stored in small aliquots at —80 °C. Selenomethionine labelling of Ctf4(CTD) 
was achieved by metabolic inhibition of the methionine pathway” and overnight 
expression as for the wild-type protein. The selenomethionine-labelled protein was 
purified in the same way as the native Ctf4(CTD) except that all buffers were sup- 
plemented with 10 mM DTT. 

GINS constructs were overexpressed in E. coli strain BL21(DE3)Rosetta2 with 
IPTG induction and overnight expression at 25 °C in LB medium. After overexpres- 
sion, 81 of cells were harvested and re-suspended in 50 mM Tris pH 7.0, 500 mM 
NaCl, 10% (w/v) glycerol, 1 mM DTT and protease inhibitor. Cells were lysed via 
sonication, the crude extract was clarified by centrifugation and the supernatant 
was applied to a 3-ml column of nickel agarose resin (Sigma) using gravity flow. 
GINS bound to beads was eluted with buffer supplemented with 10 mM imidazole. 
The salt concentration of the eluted GINS sample was adjusted to below 160 mM 
NaCl and the protein was applied to an ion-exchange 6-ml Resource Q column 
pre-equilibrated in 20 mM HEPES pH 8.0, 160 mM NaCl and eluted with a buffer 
gradient of 0.16 M to 0.5 M NaCl over 40 column volumes. Peak fractions contain- 
ing GINS were pooled and further purified by gel filtration over a Superdex 200 16/ 
60 column as described for Ctf4(CTD). Purified GINS samples were flash-frozen in 
liquid nitrogen and stored in small aliquots at —80 °C. 

Protein expression and purification for electron microscopy and MALS. Each 
expression construct was transformed into BL21 (DE3)-CodonPlus cells (Strata- 
gene) and 2 to 41 of cells were grown to an optical density of 0.5 before induction 
with 1 mM IPTG, at 37 °C for 2 h. Each 2-1 cell pellet was re-suspended in 40 ml lysis 


buffer and cells were lysed via sonication. The resulting lysate was subject to the 
following purification steps. 

Ctf4 purification: cleared lysate containing His-tagged full-length Ctf4 or Ctf4(CTD) 
was incubated with 1 ml Ni-NTA resin (Qiagen), washed with 20 ml buffer A (50 mM 
NaH,PO, pH 8.0, 300 mM NaCl, 20 mM imidazole) and eluted five times each with 
1 ml buffer A containing 250 mM imidazole. The resulting elution was dialysed in 
21100 mM NaCl, 20 mM Tris pH 8.0, 1 mM DTT for 2 h with fresh buffer exchanged 
after the first hour. The dialysed elution was further purified by Mono Q (GE Health- 
care) ion exchange through a 0.1-1 M NaCl gradient in 20 mM Tris pH 8.0, 1 mM 
DTT over 40 ml with 0.5 ml elutions. Peak fractions from the Mono Q were con- 
centrated and polished via Superdex 200 16/600 HiLoad or 10/300 GL (GE Health- 
care) size exclusion chromatography in buffer B (150 mM NaCl, 20 mM Tris pH 8.0). 
Peak elutions were pooled and concentrated to 5 mg ml‘ and stored at —80 °C in 
2 nmol aliquots. 

GINS purifications: cleared lysate containing Strep III tagged GINS(Psf1 AC) was 
incubated with 1 ml StrepTactin resin (IBA Life Sciences), washed with 20 ml buffer 
C (150 mM NaCl, 100 mM Tris pH 8.0) and eluted five times each with 1 ml buffer 
Csupplemented with 2.5 mM desthiobiotin (IBA Life Sciences). Proteins were stored 
at —80 °C with a concentration of 1 mg ml! in 2 nmol aliquots. 

Poll purification: cleared lysate containing N-terminal Strep II- and C-terminal 
His-tagged Poll-NTD-Protein A (hereafter referred to as Poll(NTD)) was purified 
first via Ni-NTA, followed by StrepTactin affinity using the same method described 
above. 

The typical yield from 21 cells for full-length Ctf4, Ctf4(CTD), GINS(Psf1AC) 
and Poll(NTD) are around 0.15 mg, 0.65 mg, 2.5 mg and 1.5 mg, respectively. The 
identity of all proteins was confirmed by trypsinization/mass spectrometry using a 
LTQ OrbitrapXL instrument (Protein Analysis and Proteomics, LRI). 

In vitro reconstitution of recombinant protein complexes for electron micro- 
scopy. For the Ctf4-GINS complex, 2 nmol of recombinant yeast Ctf4 (full-length 
or CTD) and GINS(Psf1AC) were co-incubated in 500 mM sodium acetate for 10 min 
onice with a reaction volume of around 200 ul. To achieve high reproducibility the 
following procedure was followed. The reconstitution mix was initially dialysed in 
500 mM sodium acetate, 25 mM HEPES pH 7.6, 0.5mM DTT for 1h at 4°C in 
dialysis tubes with 6,000-8,000 Da MWCO (GeBAflex). The dialysis buffer was 
changed hourly to contain progressively 400 mM, 300 mM, 200 mM, 150 mM sodium 
acetate. 100-150 ul of the final reconstituted complex was separated via glycerol gra- 
dient sedimentation. For the Ctf4(CTD)-Poll (NTD) complex and the Ctf4(CTD)- 
GINS(Psf1AC)-Poll (NTD) complex, 2 nmol of recombinant Ctf4(CTD) was used, 
1:2 and 1:1:1 mol ratios were applied respectively and the dialysis was performed 
in buffers containing progressively 400 mM, 300 mM, 200 mM, 100 mM, 50 mM 
sodium acetate. 

Glycerol gradient sedimentation with GraFix. Glycerol gradient sedimentation 
of full-length Ctf4, Ctf4(CTD) and complexes of Ctf4(CTD) with GINS(Psf1AC) 
and Poll(NTD) was performed based on the GraFix method”. Briefly, 5 ml 10% to 
30% or 15% to 35% glycerol gradients were poured either with or without 0% to 
0.1% glutaraldehyde gradient. The protein or reconstituted protein complex was 
loaded on top of the gradient and centrifuged at 50,000 r.p.m., 4°C in a SW 55 Ti 
ultracentrifuge rotor (Beckman Coulter) for 16 h. Fractions were collected manually 
from the top of the gradient, resolved through a 4%-to-12% polyacrylamide-gradient 
gel (Biorad) in MOPS buffer at room temperature and silver stained for analysis. 
Crystallization and structure determination of Ctf4(CTD). Ctf4(CTD) crystals 
were grown by vapour diffusion in hanging drop, mixing equal volumes of Ctf4(CTD) 
protein at 10 mg ml and 0.2 M tri-sodium citrate pH 6.2, 7-9% PEG 8000 and 
0.45-0.9 M NaCl at 19 °C. Ctf4(CTD) crystals appeared within 2-3 days and grew 
to full size over the course of 2 weeks. For structure determination, selenomethionine- 
labelled Ctf4(CTD) crystals were grown against 0.2 M tri-sodium citrate pH 6.2 
and 8-10% PEG 8000 at 19 °C, using the same protein concentration and drop ratio 
as for the native protein. 

X-ray diffraction data for selenomethionine-labelled Ctf4(CTD) crystals were 
collected at the peak wavelength of the selenium K-edge (A = 0.97938 nm) at beam- 
line 103 of the Diamond Light Source. The data were integrated with XDS*, space 
group symmetry was assigned in POINTLESS and intensities scaled in AIMLESS”. 
The selenomethionine protein crystallized in the orthorhombic space group P22,2; 
with unit cell dimensions of a = 107.1 A, b = 118.1 A,c = 155.7 Aand one Ctf4(CTD) 
trimer per asymmetric unit. The position of the selenium atoms was determined 
using the single-wavelength anomalous dispersion (SAD) method in PHENIX 
Autosol, an interpretable electron density map was calculated to a resolution of 2.7 A 
and an initial model was generated using the PHENIX AutoBuild function”. The 
crystallographic model was extended and completed by repeated cycles of manual 
building in Coot and crystallographic refinement with PHENIX Refine’*’’. The 
final model was refined using data to 2.7 A, to R-work and R-free values of 0.1895 
and 0.2284 anda Molprobity score of 1.15 (ref. 38). The following amino acids were 
not included in the final model due to missing or poor electron density and are 


©2014 Macmillan Publishers Limited. All rights reserved 


presumed to be disordered: 471 to 473, 644 to 647, 797 to 813 and 926 to 927 in 
chain A; 471 to 473, 644 to 647, 664 to 670, 794 to 813 and 924 to 927 in chain B; 
471 to 473, 664 to 670, 794 to 813 and 926 to 927 in chain C. Statistics of data pro- 
cessing and crystallographic refinement are reported in Extended Data Table 1. 
X-ray diffraction data for the native Ctf4(CTD) crystals were collected at beam- 
line 104 of the Diamond Light Source and the data were processed as for the sele- 
nomethionine data set. The native protein crystallized in the same orthorhombic 
space group P22,2, as the selenomethionine protein crystals, but with different unit 
cell dimensions a = 88.9 A, b= 100.0 A, c= 219.3 A, caused by an alternative set of 
crystal contacts made by the Ctf4(CTD) trimer in the asymmetric unit. The struc- 
ture of native Ctf4(CTD) was solved by molecular replacement in PHASER”, using 
the structure of one protomer of the selenomethionine Ctf4(CTD) trimer as search 
model. The final model was refined using data to 3.0 A resolution, to R-work and 
R-free values of 0.1674 and 0.2049 and a Molprobity score of 1.42. The following 
amino acids were not included in the final model due to missing or poor electron 
density and are presumed to be disordered: 471 to 473, 664 to 670, 792 to 813 in 
chain A; 471 to 473, 797 to 813 in chain B; 471 to 473, 664 to 670, 777 to 927 (helical 
domain) in chain C. In this crystal form, the Ctf4(CTD) structure adopts a more 
open conformation where one interface between Ctf4 protomers widens to become 
a narrow gap and the helical domain of one of the Ctf4(CTD) protomers at the 
interface becomes disordered (Extended Data Fig. 4). Statistics of data processing 
and crystallographic refinement are reported in Extended Data Table 1. 
Co-crystallization of Ctf4(CTD) with Pol @ and Sld5. For co-crystallization 
experiments, the peptides IDNFDDILGEFES and MDINIDDILAELDKETTAV, 
corresponding to amino acids 137-149 of yeast Poll and 1-19 of yeast Sld5, respec- 
tively, were synthesized. The Poll peptide was solubilized in the same buffer as puri- 
fied Ctf4(CTD) to a concentration of 340 1M; the Sld5 peptide was solubilized in 
water to a concentration of 2 mM. Soaking was performed by adding 1 ll of Poll 
peptide or 0.5 pil of Sld5 peptide to a 2 ul crystallization drop containing native 
Ctf4(CTD) crystals. The crystals were soaked with the peptide for 24h at 19 °C, 
back-soaked in crystallization buffer and flash-frozen in liquid nitrogen. X-ray dif- 
fraction data for Ctf4(CTD) crystals soaked with the Pol « and Sld5 peptides were 
collected on beamline 104 of the Diamond Light Source and processed as for the 
native crystals. The position of the Ctf4-binding motifs of Pol « and Sld5 in the 
crystals structure of Ctf4(CTD) was readily identified by inspection of F,—F, dif- 
ference Fourier maps. Amino acids 140-149 of Poll and 3-15 of Sld5 were built in 
the electron density map and the structures of Ctf4(CTD) bound to Pol o and Sld5 
were then further refined using Coot and PHENIX Refine to R-work/R-free values 
of 0.1718/0.2099 and 0.1787/0.2141, respectively. Molprobity scores for the Ctf4(CTD)- 
Pol «and Ctf4(CTD)-Sld5 structures were 1.33 and 1.32, respectively. Statistics of data 
processing and crystallographic refinement are reported in Extended Data Table 1. 
Sample preparation for EM. Negative stain analysis was performed using 400 mesh 
carbon coated grids (Agar Scientific). Carbon was evaporated onto freshly cleaved 
mica with a QI50TE coater (Quorum Technologies) and incubated overnight before 
floating. Dried carbon grids were glow discharged for 30-60 at 45 mA using a 
100X glow discharger (Electron Microscopy Sciences). A 4-1l drop of the peak frac- 
tion from each GraFix-processed sample was applied onto the grid. Subsequently, 
grids were sequentially laid on top of five distinct 75 11 drops of 2% uranyl formate 
solution, and stirred for 10s each time, before blotting to dryness. 
EM data collection. Negative stain analyses of all complexes were performed using a 
Tecnai LaB6 G2 Spirit transmission electron microscope (FEI) operating at 120 keV 
(Electron Microscopy Unit, London Research Institute). Images were recorded using 
a 2k X 2k GATAN Ultrascan 100 camera at a nominal magnification of 30,000 
(3.45 A/pixel at the specimen level). Between 100 and 350 micrographs were col- 
lected for each data set. 
Single-particle analysis. CTF corrected image stacks were prepared in the EMAN2 
environment”. Single-particle symmetry analysis was performed as described*'. 
Reference-free two-dimensional class averages were calculated using the one-step 
rotation and classification approach as described”, followed by routine MSA/MRA 
IMAGIC protocols”. 
GST pull downs. For each Pol « and Sld5 construct to be tested for interaction 
with Ctf4(CTD), a 25-ml E. coli BL21(DE3) culture overexpressing the GST fusion 
construct was pelleted, re-suspended in buffer 50 mM Tris pH 7.0, 500 mM NaCl, 
10% (w/v) glycerol, 1 mM DTT and protease inhibitors (Sigma) and lysed by soni- 
cation. After centrifugation, the soluble extract was mixed with 50 pl of Gluta- 
thione Sepharose beads (GE Healthcare) pre-equilibrated in the same buffer and 
incubated under rotation at 4 °C for 1 h. Unbound protein was removed by three 
consecutive washes with 1 ml of buffer, followed by 3 1-ml washes with pull-down 
buffer (20 mM HEPES pH 7.2, 150 mM NaCl, 5% (w/v) glycerol, 0.1% Igepal CA- 
630, 1mM TCEP and 1% BSA). Subsequently, 500 ll of purified Ctf4(CTD) pro- 
tein ata concentration of 2 mg ml” * was added to the Sepharose beads and binding 
was allowed to take place for an additional hour at 4 °C. The binding reaction was 
stopped by two consecutive washes with 1 ml of pull-down buffer and a final 1-ml 
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wash with pull-down buffer without BSA. The Sepharose beads were mixed with 
SDS loading dye and Ctf4(CTD) interactions with the respective bait proteins were 
detected via SDS-PAGE. As a control, Ctf4(CTD) was tested for unspecific inter- 
action with the Glutathione Sepharose resin and with GST and in both cases no inter- 
action was detected. 

Yeast strains and growth. The yeast strains CC2619 (POL1-9MYC pep4A::URA3 
ADE2) and CC10682 (pol1-A POL1-9MYC pep4A::URA3 ADEZ2) were grown at 24 °C 
in rich medium (1% yeast extract, 2% peptone, 40 g/ml adenine) with 2% glucose as 
carbon source. Cells were synchronized in G1 phase by adding 7.5 pg ml“! alpha 
factor mating pheromone for 70% of one generation time, followed by additional 
aliquots of 2.5 jig ml’ every 20 min up to 1.5 generation times. 
Immunoprecipitation of proteins from yeast cell extracts. Myc-tagged proteins 
were isolated from yeast cell extracts as described previously”. 

Gel filtration. All proteins were purified as described above. Putative complexes 
were reconstituted before analytical gel filtration by mixing stoichiometric molar 
ratios of Ctf4(CTD) with either GINS(Psf1 AC) or GINS(Psf1AC,SId5AN) followed 
by centrifugation at 16,000g for 10 min at 4 °C to remove potential aggregates. 100 pil 
samples of Ctf4(CTD), GINS(Psfl AC), GINS(Psf1AC,SId5AN), Ctf4(CTD)-GINS 
(Psf1 AC) and Ctf4(CTD)-GINS(Psf1AC,SId5AN) were subsequently fractionated 
over a Superdex $200 HR 10/300 column (GE Healthcare) pre-equilibrated in 20 mM 
HEPES pH 7.2, 140 mM KCL. 

MAIS analysis of Ctf4(CTD). 100 ul of Ctf4(CTD) protein at a concentration of 
2 mg ml ' was loaded onto a Superdex $200 HR 10/300 gel-filtration column (GE 
Healthcare) in 25 mM HEPES pH 7.0, 200 mM NaCl ata flow rate of 0.5 ml min~ 
The column was controlled using an Akta Purifier System (GE Healthcare) and was 
linked toa DAWN 8° 8-angle light scattering detector (Wyatt Technology) with a 
fused silica sample cell using a laser wavelength of 664 nm. The change in the refrac- 
tive index was detected using an Optilab T-rEX refractometer with extended range 
(Wyatt Technology) using a wavelength of 658 nm. Data collection and analysis was 
carried out using the ASTRA6 software package (Wyatt Technology). Molecular 
mass determination across the sample peak was carried out using a Zimm-plot derived 
global fitting algorithm with a fit degree of 1 and a dn/dc value of 0.1850 ml g~'. 
MALS analysis of full-length Ctf4. Around 100 jig of Ctf4 full-length protein was 
loaded onto a Wyatt MP-030S5 HPLC size-exclusion chromatography column (Wyatt) 
mounted on an AKTA Micro (GE Healthcare) chromatography. The column was 
equilibrated in a buffer containing 150 mM NaCl, 20 mM Tris pH 8.0, 1 mM DTT. 
The chromatography system was coupled to an 8-angle light scattering detector 
(DAWN 8+) anda refractive index detector (OptiLab T-Rex) (Wyatt Technology). 
Data were collected every 0.5 s. Data analysis was carried out using ASTRA VI. 
Fluorescence polarization. Both the Poll peptide (137-IFDNDDILGEFES- 149) 
and the yeast Sld5 peptide (1- MDINIDDILAELDKETTAV -19) were synthesized 
with an N-terminal fluorescein label. The lowest concentration of peptide at which 
the binding studies could be performed was determined via peptide calibration curves. 
Fluorescence anisotropy measurements were recorded in a PHERAstar Plus multi- 
detection plate reader (BMG Labtech) equipped with fluorescence polarization optic 
module (A. = 485 nm; Zam = 520 nm) at 25°C. Each data point is the mean of 
200 flashes per well. The voltage gain was set by adjusting the target mP values of 
fluorescein-labelled peptides relative to that of fluorescein (35 mP). Serial dilutions 
of Ctf4(CTD) were made in 20mM HEPES, pH 7.2, 140 mM KCl and 5% (w/v) 
glycerol in the presence of 40 nM (Sld5) or 50 nM (Pol1) fluorescein-labelled pep- 
tide. Each data point is the mean of three independent experiments. Curve fitting to 
the experimental data was performed in ProFit 6.2 (QuantumSoft) using a Robust 
fitting algorithm in combination with a Lorentzian error distribution analysis. 
Native mass spectrometry. In preparation for non-denaturing nano-electrospray 
ionization mass spectrometry (native mass spectrometry), protein samples were sub- 
jected to two successive rounds of buffer exchange into 500 mM ammonium acetate 
using illustra NAP-5 columns (GE Healthcare). For reconstitution of the Ctf4(CTD)- 
Pol «and Ctf4(CTD)-Sld5 complexes, Ctf4(CTD) was incubated with a tenfold or 
fivefold molar excess of Poll peptide 137-IDNFDDILGEFES- 149 or Sld5 peptide 
1-MDINIDDILAELDKETTAV- 19, respectively, for 30 min before buffer exchange. 
After buffer exchange, samples were concentrated to at least 50 1M in preparation 
for mass spectrometric analysis. Native mass spectra were recorded on a Synapt 
HDMS instrument (Waters), and calibrated using caesium iodide (100 mg ml ~ ty 
as described previously”. Typical parameter values were: capillary voltage 1.8 kV, 
cone voltage 40-80 V, cone gas 40 |h™ ! extractor 1.2-2.2 V, ion transfer stage pres- 
sure 3.61-3.44 mbar, trap collision energy 10-15 V, transfer collision energy 10-20.0 V, 
trap and transfer pressure 5.29-5.33 X 10 * mbar, IMS pressure 5.01-5.02 X 107! 
mbar, TOF analyser pressure 1.17-1.18 X 10” ° mbar. Micromass MassLynx 4.1 was 
used for data acquisition and processing. 

Artwork. All structural drawings were prepared with UCSF Chimera®. 
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WD3 


Extended Data Figure 1 | Crystal structure of yeast Ctf4 region spanning omitted for clarity. b, Two views of the Ctf4(CTD) protomer, highlighting the 
amino acids 471-927 (C-end; Ctf4(CTD)). a, The six-bladed B-propeller helical domain of Ctf4(CTD), coloured yellow to red from the amino to the 
of Ctf4(CTD). The structure is drawn as a ribbon diagram, coloured blue to carboxy terminus. The a-helices are labelled «1 to «6. The B-propeller domain 
light green from the N to the C terminus. The six blades of the propeller are _ is in light grey. 

labelled WD1 to WD6, from the N to C end. The helical domain has been 
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Extended Data Figure 2 | 2D EM of full-length Ctf4 and Ctf4(CTD). 

a, Silver-stained SDS-PAGE gel showing the purified full-length yeast Ctf4 and 
GraFix gel of the same preparation. A red box highlights the fraction imaged 
by EM. b, Representative micrograph of the full-length yeast Ctf4. c, Zoomed-in 
view of the micrograph in panel b. d, Reference-free class averages of the 
full-length protein complex highlight the presence of a ring-shaped core linked 
to mobile, satellite densities. Box size 448 A. e, Multi-angle light scattering 
reveals that Ctf4 forms stable homotrimers in solution (absolute molecular 
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mass equates to 322 kDa, with a 0.6% error; expected molecular mass for a 
monomer: 106.9 kDa). f, Silver-stained SDS-PAGE gel showing the purified 
Ctf4(CTD) and GraFix gel of the same preparation. A red box highlights the 
fraction imaged by EM. g, Representative micrograph of the Ctf4(CTD) 
complex. h, Zoomed-in view of the micrograph in panel g. i, Reference-free 
class averages of the Ctf4(CTD) protein complex highlight the presence of an 
oligomerization core. Box size 448 A. 
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Extended Data Figure 3 | Crystal forms of the Ctf4(CTD) structure. 

a, b, Side-by-side comparisons of the symmetric, closed form of the Ctf4(CTD) 
trimer (left-hand side) with the pseudo-symmetric, open form (right-hand 
side), used for study of the interaction with the Ctf4-binding motifs of Pol « and 
Sld5. Panel a shows a side view of the two crystal forms, drawn as ribbons, with 
the B-propeller domain of the three protomers coloured blue, cyan and light 
blue, respectively, and the helical domain coloured yellow. Panel b shows a top 
view of the two crystal forms, drawn as ribbons, with the symmetric, closed 
form of the Ctf4(CTD) trimer in light grey and the pseudo-symmetric, open 


form in dark grey. The helical domains have been removed for clarity. 

c-e, Superpositions of the Ca traces of the protomers of the symmetric, closed 
form of the Ctf4(CTD) trimer, the protomers of the pseudo-symmetric, open 
Ctf4(CTD) trimer and all protomers of the two crystal forms, respectively. 

In panels c and d, the protomers are coloured according to secondary structure, 
with B-strands in cyan and a-helices in yellow; the crystal form of the 
superimposed protomers is highlighted by a space-fill model of the structure in 
the top-left corner of the panel. In panel e, the Ctf4(CTD) protomers of the 
closed and open forms are coloured light and dark grey, respectively. 
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Extended Data Figure 4 | Atomic details of the Ctf4—Pol a (a) and white (Ctf4) or light brown (Pol « and Sld5), oxygen atoms in red, nitrogen 
Ctf4-Sld5 (b) interfaces. Ctf4(CTD) is drawn as a yellow ribbon, the atoms in blue and sulphur atoms in yellow. The bidentate salt link between 
Ctf4-binding motifs of Polo and Sld5 as green and red tubes. The side chains of | D142 (Poll) or D7 (Sld5) and R904 of Ctf4 is shown as solid pink lines. 
amino acids at the interface are shown as sticks, with carbon atoms coloured 
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Extended Data Figure 5 | Structure-based multiple sequence alignment of _ for {-sheets and «-helices, respectively. The extent of the WD40 domains of the 


S. cerevisiae Ctf4, S. pombe Mcl1 and H. sapiens And1. Only the region 
corresponding to the crystal structure described in the paper is reported. 
Observed secondary structure elements of Ctf4 and predicted secondary 


structure elements of Mcl1 and And] are boxed and shaded in green and yellow 


six-bladed B-propeller and the «helices of the helical domain are illustrated 
above the alignment. Ctf4 residues that form the interface with Pol « and Sld5 


are marked by an asterisk. 
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Extended Data Figure 6 | Ctf4 interactions with Pol a and CMG. a, Binding 
affinity of Ctf4(CTD) for the Ctf4-binding motifs of Pol « (top panel) and 
Sld5 (bottom panel). Affinity was measured by fluorescence anisotropy of 
fluorescein-labelled peptides in the presence of increasing amounts of Ctf4. See 
Methods for experimental details. b, CMG still associates with Ctf4 and Poll in 
yeast cells with mutations in the Ctf4-binding motif of Sld5. The budding 
yeast strains MCM4-5FLAG (Control) and MCM4-5FLAG sld5-A2-9 

(the endogenous copy of SLD5 was modified to create sld5-A2-9, such that the 
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encoded protein lacks amino acids 2-9) were grown at 24 °C, arrested in G1 
phase, and then released into S phase for 30 min. Mcm4-5FLAG was then 
isolated from cell extracts by immunoprecipitation, and the indicated proteins 
were detected by immunoblotting with the corresponding antibodies” 

(top panel). An analogous experiment was performed with MCM4-5FLAG 
(Control) and MCM4-5FLAG sld5-GA (the endogenous copy of SLD5 was 
modified to create sld5-GA, such that amino acids 5-9 were changed from 
Ile-Asp-Asp-Ile-Leu to Gly-Ala-Gly-Ala-Gly) (bottom panel). 
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Extended Data Figure 7 | 2D EM analysis of Ctf4(CTD)-GINS complexes. 
Comparison between GINS-Ctf4(CTD) complexes prepared by glycerol 
gradient or GraFix. a, SDS-PAGE gel of non-crosslinked GINS-Ctf4(CTD) 
complex. A red box highlights the fraction imaged by EM. b, Representative 
micrograph for the non-crosslinked preparation highlights small, globular 
particles. c, SDS-PAGE gel of crosslinked GINS-Ctf4(CTD) complex. A red 
box highlights the fraction imaged by EM. d, Representative micrograph for the 
crosslinked preparation highlights elongated features compatible with one, 
two or three GINS docked onto a Ctf4 trimerization core. e, Zoomed-in view of 
the same micrograph. f, Representative class averages of the Ctf4(CTD)-GINS 
complex show a mixture of complexes with clearly discernible stoichiometry: 
Ctf4;-GINS, Ctf4;-(GINS)., Ctf4,-(GINS)3. The box size is 448 A. 
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Myoglobin Ovalbumin y-Globulin Thyroglobulin Extended Data Figure 9 | 2D EM analysis of Ctf4(CTD)-GINS-Poll(NTD) 
debe 4ysPe “igea B70KD8 complexes. a, Silver-stained SDS-PAGE gel showing the purified 
ka inp Mt AN a Ea tak GINS-Ctf4(CTD)-Poll(NTD) complex and GraFix gel of the same 
' preparation. A red box highlights the fraction imaged by EM. b, Representative 
200 - — micrograph of the GINS-Ctf4(CTD)-Poll (NTD) complex. c, Zoomed-in view 
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Extended Data Table 1 


X-ray diffraction data processing and crystallographic refinement 


Data collection 
Space group 
Cell dimensions 
a, b,c (A) 
a, By (°) 


Wavelength (A) 
Resolution (A) 


Ryym or Roerge 
ol 
Completeness 
(%) 
Redundancy 


Refinement 

Resolution (A) 

No. reflections 

Rwork/ Ree 

No. atoms 
Protein 
Ligand/ion 
Water 

B-factors 
Protein 
Ligand/ion 
Water 

R.m.s deviations 
Bond lengths 

(A) 
Bond angles (°) 


Native 
P22,2, 


89.1 100.3 219.5 
90.0, 90.0, 90.0 


0.9173 
49.17-3.0 (3.16- 
3.0)* 
0.194 (0.983) 
12.6 (2.5) 
99.9 (99.9) 


9.3 (9.7) 


48.87-3.0 
40149 
0.1674/0.2049 
9447 
9349 
98 
51.6 
51.8 


3305 
0.003 


0.65 


Seleno-methionine 


P22, 2; 


107.3, 118.4, 155.7 


90.0, 90.0, 90.0 
Peak 
0.97944 
49.9-2.69 (2.77- 
2.69) 
0.228 (3.383) 
17.4(1.1) 
99.5 (94.5) 


26.7 (23.9) 


49.19-2.69 
55384 
0.1895/0.2248 
10421 
10258 
163 
73.4 
73.7 


56 


Pol a soak 
P 2 2924 


89.0 100.0 219.5 
90.0, 90.0, 90.0 


0.9795 
49.2-2.69 (2.77- 
2.69) 
0.163 (1.473) 
10.2 (1.4) 
99.6 (95.0) 


7.3 (7.1) 
49.2-2.694 
54707 
0.1718/0.2099 
9617 
9457 
160 
59.2 
59.5 
44.2 
0.003 


0.67 


Sld5 soak 
P22; 2; 


88.9, 99.7, 218.6 
90.0, 90.0, 90.0 


0.9795 
49.07-2.69 (2.77- 
2.69) 
0.174 (1.691) 
10.3 (1.4) 
99.2 (90.5) 


7.3 (7.1) 
47.92-2.694 
54193 
0.1787/0.2141 
9684 
9501 
183 
59.5 
59.8 
47.1 
0.002 


0.63 


*Statistics for the highest resolution shell is shown in parenthesis. 
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Quantitative flux analysis reveals folate-dependent 


NADPH production 


Jing Fan'*, Jiangbin Ye**, Jurre J. Kamphorst', Tomer Shlomi’, Craig B. Thompson” & Joshua D. Rabinowitz! 


ATP is the dominant energy source in animals for mechanical and 
electrical work (for example, muscle contraction or neuronal firing). 
For chemical work, there is an equally important role for NADPH, 
which powers redox defence and reductive biosynthesis’. The most 
direct route to produce NADPH from glucose is the oxidative pentose 
phosphate pathway, with malic enzyme sometimes also important”. 
Although the relative contribution of glycolysis and oxidative phos- 
phorylation to ATP production has been extensively analysed, similar 
analysis of NADPH metabolism has been lacking. Here we demon- 
strate the ability to directly track, by liquid chromatography-mass 
spectrometry, the passage of deuterium from labelled substrates into 
NADPH, and combine this approach with carbon labelling and math- 
ematical modelling to measure NADPH fluxes. In proliferating cells, 
the largest contributor to cytosolic NADPH is the oxidative pentose 
phosphate pathway. Surprisingly, a nearly comparable contribution 
comes from serine-driven one-carbon metabolism, in which oxida- 
tion of methylene tetrahydrofolate to 10-formyl-tetrahydrofolate is 
coupled to reduction of NADP* to NADPH. Moreover, tracing of mi- 
tochondrial one-carbon metabolism revealed complete oxidation of 
10-formyl-tetrahydrofolate to make NADPH. As folate metabolism 
has not previously been considered an NADPH producer, confirma- 
tion of its functional significance was undertaken through knockdown 
of methylenetetrahydrofolate dehydrogenase (MTHFD) genes. Deple- 
tion of either the cytosolic or mitochondrial MTHED isozyme resulted 
in decreased cellular NADPH/NADP* and reduced/oxidized glu- 
tathione ratios (GSH/GSSG) and increased cell sensitivity to oxid- 
ative stress. Thus, although the importance of folate metabolism for 
proliferating cells has been long recognized and attributed to its func- 
tion of producing one-carbon units for nucleic acid synthesis, another 
crucial function of this pathway is generating reducing power. 
Previous examination of NADPH production during cell growth has 
analysed metabolic fluxes in cells using '*C and ““C isotope tracers*”. 
For NADPH metabolism, however, carbon tracers alone are insufficient, 
because they cannot determine whether a particular redox reaction is 
making NADH versus NADPH or the reaction’s fractional contribution 
to total cellular NADPH production. To address these limitations, we 
developed a deuterium tracer approach that directly measures NADPH 
redox active hydrogen labelling. To probe the oxidative pentose phos- 
phate pathway, we shifted cells from unlabelled to [1-*H]glucose or 
[3-H] glucose (Fig. 1a) and measured the resulting NADP* and NADPH 
labelling by liquid chromatography-mass spectrometry (LC-MS)’, as 
shown in the mass spectrum in Fig. 1b (for associated chromatogram, 
see Extended Data Fig. 1a). The M+1 and M+2 peaks in NADP* are 
natural isotope abundance, primarily from ‘°C. The difference between 
NADP* and NADPH reflects the redox active hydrogen labelling. The 
labelling of NADPH’s redox-active hydrogen is fast (1/2 ~ 5 min) (Fig. 1c 
note, as opposed to relative mass intensities, all fractional labelling data 
are corrected for natural isotope abundance). NADPH labelling was sim- 
ilar across four different transformed mammalian cell lines. Knockdown 
of the committed enzyme of the oxidative pentose phosphate pathway, 


glucose-6-phosphate dehydrogenase, eliminated most of the labelling, 
confirming that the NADPH-deuterium labelling reflects oxidative pen- 
tose phosphate pathway flux (Fig. 1d). 

Because most NADPH is cytosolic’, the “H-glucose labelling results 
can be used to quantitate the fractional contribution of the oxidative pen- 
tose phosphate pathway (oxPPP) to total cytosolic NADPH production: 


Fraction appH from oxPPP = 


‘ ( ["H]INADPH ) ‘ ( PH]GoP a. 


Total NADPH Total G6P 


The terms in parentheses are the fractional *H-labelling of NADPH’s 
redox active hydrogen and of glucose-6-phosphate’s targeted hydrogen 
(Fig. le, Extended Data Fig. 1b-d). The term Cxqp accounts for the deu- 
terium kinetic isotope effect’*”’ (see Methods, Extended Data Fig. le-g). 
Note that these *H-labelling experiments directly measure the fraction 
of NADPH made by the oxidative pentose phosphate pathway with- 
out relying on measurement of the absolute pathway flux. Using either 
[1-°H] glucose or [3-’H] glucose, we find that oxidative pentose phos- 
phate pathway accounts for 30-50% of overall NADP* reduction. 
The inferred fractional contribution of oxidative pentose phosphate 
pathway to NADPH production can be used to deduce the total cytosolic 
NADPH production rate, which is equal to the absolute oxidative pentose 
phosphate pathway flux divided by the fractional contribution of the 
oxidative pentose phosphate pathway to NADPH production (Fig. 11). 
To this end, we measured absolute oxidative pentose phosphate pathway 
flux using two orthogonal approaches. The first approach measures “CO, 
release from [1-"*C] glucose versus [6-"*C] glucose (Extended Data Figs 2a—c 
and 3). The second measures the kinetics of 6-phosphogluconate label- 
ling from uniformly C-labeled glucose ([U- BC] glucose) (Extended Data 
Fig. 2d-f). Both approaches gave consistent fluxes, with the radioactive 
measurement being more precise (Extended Data Fig. 2g). As confirma- 
tion of its specificity, we knocked down glucose-6-phosphate dehy- 
drogenase and observed markedly reduced oxidative pentose phosphate 
pathway CO), release (Fig. 1g). In the absence of such knockdown, 
the observed oxidative pentose phosphate pathway flux ranged from 
1-2.5 nmol pl © hol (in which volume is the packed cell volume; 
Fig. 1g). This flux is similar to, but slightly less than, the cellular ribose 
demand (Extended Data Fig. 3f). In combination with the fractional 
NADPH labelling, we deduced a total cytosolic NADPH production rate 
of ~10 nmol pl *h' (Fig. 1h), which is 5~20% of the glucose uptake rate. 
To investigate whether we could use *H-labelling to directly observe 
NADPH production by other pathways (Fig. 2a), we administered [2,3, 
3,4,4-"H] glutamine and [2,3,3-’H]aspartate to cells. Downstream pro- 
ducts of glutamine can potentially transfer “H to NADPH via glutamate 
dehydrogenase or malic enzyme, whereas downstream products of as- 
partate may do so via isocitrate dehydrogenase (Extended Data Fig. 4a-f). 
We observed identical mass spectra for NADP* and NADPH after feed- 
ing the deuterium-labelled glutamine and aspartate (Fig. 2b, c and Ex- 
tended Data Fig. 4b, d), and thus could not directly assign a fractional 
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Figure 1 | Quantification of NADPH labelling via the oxidative pentose 
phosphate pathway and of total cytosolic NADPH production. a, Oxidative 
pentose phosphate pathway schematic. b, Mass spectra of NADPH and 
NADP* from cells labelled with [1-7H] glucose (iBMK-parental cells, 20 min). 
c, Kinetics of NADPH labelling from [1-H] glucose (iBMK-parental cells). 

d, NADPH labelling from [1-7H] glucose (20 min). e, [1-"H] glucose and 
[3-?H] glucose yield similar NADPH labelling (iBMK-parental cells, 20 min). 
Substrate labelling is reported for glucose-6-phosphate for [1-*H]glucose and 
6-phosphogluconate for [3-*H] glucose. oxPPP, oxidative pentose phosphate 
pathway. f, Schematic illustrating that the total cytosolic NADP™ reduction flux 
is the absolute oxidative pentose phosphate pathway flux (measured based on 
4CO, excretion) divided by the fractional oxidative pentose phosphate 
pathway contribution (measured based on NADPH *H-labelling). g, Oxidative 
pentose phosphate pathway flux based on difference in '*CO, release from 
[1-'*C] glucose and [6-“C] glucose. h, Total cytosolic NADP* reduction flux. 
All results are mean + s.d., n = 2 biological replicates from a single experiment 
and results were confirmed in multiple experiments. 


contribution to these pathways. Given recent evidence that malic enzyme 
is particularly important in cancer’, we used an orthogonal approach 
based on feeding [U-'*C]glutamine and measuring labelling of pyru- 
vate, and lactate to evaluate its activity (Extended Data Fig. 4g, h). Although 
such carbon tracer studies cannot distinguish between NADH-dependent 
and NADPH-dependent malic enzyme, they put an upper bound on their 
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collective activities, which ranged from 15% to 50% of cytosolic NADPH 
production depending on the cell line. 

To identify other potential NADPH producing pathways, we used a 
genome-scale human metabolic model’*. We constrained the model based 
on the observed steady-state growth rate, biomass composition, and metab- 
olite uptake and excretion rates of immortalized baby mouse kidney cells 
(iBMK-parental cells)’, without enforcing any constraints on NADPH 
production routes. The model, assessed via flux balance analysis with 
an objective of minimizing total enzyme expression requirements and 
hence flux'* (see Methods), predicted that both the oxidative pentose 
phosphate pathway and malic enzyme contribute ~30% of NADPH 
production (Fig. 2d). Surprisingly, however, ~40% of NADPH produc- 
tion was predicted to come from one-carbon metabolism mediated by 
tetrahydrofolate (THF). An alternative objective function of maximiz- 
ing growth rate further predicts a potentially substantial contribution 
of folate metabolism to NADPH production (Extended Data Fig. 5a, b). 

The main folate-dependent NADPH-producing pathway was pre- 
dicted to involve transfer of a one-carbon unit from serine to THF, fol- 
lowed by oxidation of the resulting product (methylene-THF) by the 
enzyme MTHED to form the purine precursor formyl-THF with con- 
comitant NADPH production. To assess whether this pathway indeed 
contributes to NADPH production, we supplied cells with [2,3,3-7H]serine 
and observed labelling of both NADP and NADPH. The NADP* label- 
ling results from incorporation of the serine-derived formyl-THF one- 
carbon unit into the adenine ring of NADP‘. Relative to NADP", the 
labelling pattern of NADPH was shifted towards more heavily labelled 
forms, indicating specific labelling of the redox active hydrogen of NADPH 
(Fig. 2e and Extended Data Fig. 5c, d). Thus, we were able to directly con- 
firm that serine-driven folate metabolism contributes to NADP™ reduction. 

Toassess the functional significance of different pathways to NADPH 
homeostasis, in HEK293T cells we knocked down a variety of poten- 
tial NADPH-producing enzymes and measured the cellular NADPH/ 
NADP* ratio (Fig. 2f). Although knockdown of malic enzyme 1 (ME1), 
cytosolic or mitochondrial NADP-dependent isocitrate dehydrogen- 
ase (IDH1 and IDH2), and transhydrogenase (NNT) did not signifi- 
cantly impact NADPH/NADP*, knockdown of glucose-6-phosphate 
dehydrogenase or either isozyme of methylene tetrahydrofolate dehy- 
drogenase (MTHFD1, cytosolic, or MTHFD2, mitochondrial) substan- 
tially decreased it. These observations further support the primacy, at 
least in this growing cell line, of the pentose phosphate and folate path- 
ways in NADPH production. 

The importance of both isozymes of methylene tetrahydrofolate dehy- 
drogenase suggests that cytosolic and mitochondrial folate metabol- 
ism (Fig. 3a) both contribute to NADPH homeostasis. The product of 
methylene tetrahydrofolate dehydrogenase, 10-formyl-THF, is a required 
purine precursor, with each purine ring containing two formyl groups. 
Thus, the cytosolic 10-formyl-THF production rate must be at least twice 
the purine biosynthetic flux. The most direct path to cytosolic 10-formyl- 
THF is via MTHFD1 with concomitant NADPH production (Fig. 3a, 
solid blue lines). Alternatively, 10-formyl-THF could potentially be made 
from formate initially generated in the mitochondrion (Fig. 3a, dashed 
lines)'*'°. To distinguish between these possibilities, we administered 
[U-'’C]glycine, which contributes selectively to mitochondrial one- 
carbon pools (Fig. 3a, green lines). Glycine is assimilated intact into 
purines, resulting in M+2 labelling of ATP; however, we did not ob- 
serve any M+1, M+3 or M+4 ATP, indicating that mitochondrial- 
derived one-carbon units do not contribute to purine biosynthesis (Fig. 3b). 
Consistent with this, supplying [U-'*C]serine revealed that most one- 
carbon units assimilated into purines come from serine (Extended Data 
Fig. 6a, b), and knockdown of MTHFD1 nearly eliminated NADPH 
redox-active hydrogen labelling from [2,3,3-H] serine (Fig. 3c). Assum- 
ing that all 10-formyl-THF production for purine synthesis is coupled 
via MTHED1 to NADP* reduction, the total NADPH production rate 
is~2nmol pl 'h ' (Fig. 3d) or ~ 20% of total cytosolic NADPH flux. 
To probe potential further oxidation of serine, we administered [3-'*C]serine 
and observed '*CO, release, a result implying that the THF pathway 
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Figure 2 | Pathways contributing to NADPH production. a, Canonical 
NADPH production pathways. b, NADPH and NADP’ isotopic distribution 
(without correction for natural isotope abundances) after incubation with 
(2,3,3,4,4-"H] glutamine tracer to probe NADPH production via glutamate 
dehydrogenase and malic enzyme (HEK293T cells, 48 h). See also Extended 
Data Fig. 4. c, NADPH and NADP” isotopic distribution as in b using 
[2,3,3-?H]aspartate tracer to probe NADPH production via IDH. See also 
Extended Data Fig. 4. d, NADPH production routes predicted by 
experimentally constrained genome-scale flux balance analysis. e, NADPH 
and NADP* isotopic distribution as in b using the [2,3,3-’H]serine tracer to 


runs in excess of one-carbon demand yielding additional NADPH (Fig. 3d 
and Extended Data Fig. 7). 

Wealso investigated the consequences of elimination of serine from 
the medium (Extended Data Fig. 8). As has been observed previously 
both in vitro’”* and in tumour models”, serine depletion impaired cell 
growth (Extended Data Fig. 8b). Consistent with one important down- 
stream product of serine being NADPH, its removal decreased NADPH/ 
NADP” (Extended Data Fig, 8c). Glycine is both a product of serine me- 
tabolism, and itself'a potential source of one-carbon units via the mito- 
chondrial glycine cleavage system, whose expression has been linked to 
oncogenic transformation”. We therefore tested the impact of both remov- 
ing serine and increasing glycine in the culture media. We found that in- 
creased glycine further impaired cell growth and decreased the NADPH/ 
NADP” ratio (Extended Data Fig. 8b, c). These results are consistent 
with increased glycine impairing methylene-THF production, perhaps 
due to reverse flux through serine hydroxymethyltransferase (Extended 
Data Fig. 8d, e). 

The above results establish a major contribution of serine-driven one- 
carbon metabolism in NADPH homeostasis. Knockdown of MTHFD2 
also alters NADPH/NADP* , suggesting an additional role for mitochon- 
drial one-carbon metabolism. Mitochondrial folate-dependent enzymes, 
especially MTHFD2, are overexpressed across human cancers”’. To probe 
specifically mitochondrial folate metabolism, we administered '*C-labelled 
glycine and monitored radioactive CO; release. The glycine cleavage 
system releases glycine C1 as COs, while transferring glycine C2 to THF, 
making methylene-THF. Notably, almost as much radioactive CO2 was 
released from [2-'*C] glycine as from [1-'*C] glycine (Fig, 3e), indicating 
that a majority of mitochondrial methylene-THF is fully oxidized to CO. 
Consistent with such complete oxidation, when we administered '°C- 
labelled glycine, we did not observe transfer of one-carbon units to the 
cytosol based on the thymidine triphosphate (dTTP) or methionine label- 
ling, with dTTP’s one-carbon unit coming from serine (90-100%) and 
methionine coming from the medium (Extended Data Fig. 6c-f). As ex- 
pected on the basis of the mitochondrial methylene-THF oxidation 
pathway, release of glycine C2 as CO. was decreased by knockdown of 
either MTHFD2 or ALDH1L2 (Extended Data Fig. 7g). Such complete 
one-carbon unit oxidation may be beneficial for reducing the cellular 
glycine concentration. In addition, it produces mitochondrial NADPH. 
Thus, two functions of mitochondrial folate metabolism are glycine 
detoxification and NADPH production. 
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probe NADPH production via folate metabolism (no glycine in the media). See 
also Extended Data Fig. 5. f, Relative NADPH/ NADP" ratio in HEK293T cells 
with knockdown of various potential NADPH producing enzymes: glucose- 
6-phosphate dehydrogenase (G6PD), cytosolic malic enzyme (ME1), 
cytosolic and mitochondrial isocitrate dehydrogenase (IDH1 and IDH2), 
transhydrogenase (NNT), and cytosolic and mitochondrial methylene 
tetrahydrofolate dehydrogenase (MTHFD1 and MTHFD2). Plotted ratios 

are relative to vector control knockdown. Results are mean + s.d., n = 2 
biological replicates from a single experiment and results were confirmed in 
multiple experiments. 


One important role of NADPH is antioxidant defence. Consistent 
with folate metabolism being a substantial NADPH producer, antifo- 
lates have been found to induce oxidative stress**. To more directly link 
folate-mediated NADPH production with cellular redox defenses, we 
measured glutathione, reactive oxygen species and hydrogen peroxide 
sensitivity of MTHFD1 and MTHFD2 knockdown cells. Knockdown 
of either isozyme decreased the ratio of reduced to oxidized glutathione 
(Fig. 3f) and impaired resistance to oxidative stress induced by hydrogen 
peroxide (Fig. 3g, h) or diamide (Fig. 3i). MTHFD2 knockdown specif- 
ically increased reactive oxygen species (Fig. 3j), and ALDH1L2 knock- 
down decreased the ratio of reduced to oxidized glutathione (Extended 
Data Fig. 7h), demonstrating that the complete mitochondrial methylene- 
THF oxidation pathway is required for redox homeostasis. 

A major open question regards the relative use of NADPH for bio- 
synthesis versus redox defence. To address this, we compared total 
cytosolic NADPH production (as measured above) to consumption for 
biosynthesis (Fig. 4a, Methods) based on the measured cellular content 
of DNA, amino acids and lipids; their production routes (measured by 
8C tracer experiment, see Methods); and cellular growth rate (Extended 
Data Fig. 9a-g). The overall demand for NADPH for biosynthesis is 
> 80% of total cytosolic NADPH production (Fig. 4b), with most of this 
NADPH consumed by fatty acid synthesis. At least in transformed cells 
growing under aerobic conditions, most cytosolic NADPH is devoted 
to biosynthesis, not redox defence. 

To evaluate NADPH consumption for redox defence under overt 
redox stress, we treated HEK293T cells with hydrogen peroxide at a con- 
centration that blocks growth without causing substantial cell death and 
measured the total cytosolic NADPH production rate. The rate was 
5.5nmol pl 'h~*, about halfthat in freely growing cells (Extended Data 
Fig. 9h). Thus, consistent with the majority of cytosolic NADPH in 
growing cells being used for biosynthesis, growth-inhibiting oxidative 
stress decreases cytosolic NADPH production. 

The production of NADPH by the oxidative pentose phosphate path- 
way, which makes the nucleotide building block ribose, and by the 10- 
formyl-THF pathway, which contributes to purine synthesis, leads to 
an inherent coupling of nucleotide synthesis with NADPH production. 
These reactions together produce in growing cells roughly the amount 
of NADPH required for replication of cellular lipids (Fig. 4b). Inter- 
ruption of this intrinsic coordination by feeding of purines can impair 
cell growth”. In non-growing cells, or other cases in which NADPH 
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Figure 3 | Quantification of folate-dependent 


a Cytosol Mitochondria 
NADPH production. a, Pathway schematic with 
0-0-8 0-0-8 serine C3 in blue, glycine C1 in red and glycine C2 
THF Serine Serine THF in green. b, Glycine and ATP labelling pattern after 
shmtt shmt2 incubation with [U-?C]glycine (HEK293T cells, 
bifid 24h). The lack of M+3 and M+4 ATP indicates 
Glycine Glycine 


@ Methylene-THF 


2 


e 
—_—_— Methylene-THF 
e 
@ CO. 


that no glycine-derived one-carbon units 
contribute to purine synthesis. c, Fraction of 


mthfd1 mthfd2 NADPH labelled at the redox active hydrogen after 
NADPH con divers sane 24h incubation with [2,3,3-*H]serine in HEK293T 
SA a F) cells with stable MTHFD1 or MTHFD2 
@ = 10-formyl-THE === Purine | \\__10-formyl- a formyl-THE knockdown. Same cell lines used also in 
Raper 4 ea, ni A NADPH f-j. d, Absolute rate of cytosolic folate-dependent 
, lag I NADPH production. e, CO; release rate from 
ee i glycine Cl and glycine C2. f, GSH/GSSG ratio. 
I phosphate CO 
@ co, v 1 ‘ . g, Relative growth, normalized to untreated 
Formate <q = === ee = = 5 -<--- > Formate samples, during 48h exposure to H,O3. h, Cell 
b death after 24h exposure to 250 uM H20.. i, Cell 
© e d = 3.5, "CO, from serine death after 24h exposure to 300 1M diamide. 
2 70 [ay = Purine synthesis . ei s : 
—_— » Glycine =. faa 2 3.0 j, Relative reactive oxygen species (ROS) levels 
BE H ATP Rs = % 25 measured using dichlorodihydrofluorescein 
£2 - a4 ° 3 = 2.0 diacetate (DCFH-DA) assay. Mean = s.d., n = 3. 
co og 
5S 53 ° S215 
ED 30 Bo 4 oS 
a 20 a8 SE 10 
or, ii ge 
£3 10 2 i I 0.5 
[oy } Lh o¢+ 
2 0} ear N ey K © Ke of 
s M+0 Met M+2” M43 cs S J & WW Wd oh 
er << S&S LVL FE VS OS 
» » Gyr 
s = 9 
e f g 
0.45) 4 Glycine C1 1.2 1.2 
0.407 4 Glycine C2 6 1.0 = 1.0 
Q°> 0.35 a re) 
3 0.304 6 0.8 5 08 
gS 025 5 0.6 3 0.6 
aS i i i 
os o @ > shNT X\ 
SE 0-15 2 04 = 0.4) sm shMTHFD1 4X 
O = 0.10 3 @ go) ~# SHMTHFD2 . 
0.05 bs pes Die oe 
OO Go ign ar o> 40 30 50 
oS > > sy 
res 3 SP OW = O Co 
wv KOS OS OY s as nS HO, [MM] 
Le se 
@ D 
504 
30 | J 20 
25 404 
S gS 354 g 
& 20 & op 215 
s s cn 
@ 15 gq 25 3 
se) 3D 20 « 1.0 
g 10 g 15 £ 
5 | i 19 $ 05 
5 ig 
‘TelAATElEIA "Telalslzlels 0 
zB BBGE a/E/E|S\E/E se 
ETE E/E E/E EYJE Ss & “& 
S/s S/S =| =|= RS & 
SG SG G6 5/6 BS BS 
OuM 250 uM OuM 300 uM 


needs outstrip production coupled to nucleotide synthesis, it is likely 
that alternative pathways, for example, malic enzyme and IDH, will be 
of greater importance than observed here. 

The contribution of the 10-formyl-THF pathway to NADPH produc- 
tion is particularly interesting in light of the importance of metabolism 
of serine and glycine, the major carbon sources of this pathway, to cancer 
growth™. Serine synthesis is promoted by the cancer-associated M2 iso- 
zyme of pyruvate kinase (PKM2) and by amplification of 3-phosphoglycerate 
dehydrogenase’”"*. The present data suggest that serine serves dual roles 
in providing both one-carbon units and NADPH. In this respect, it is 
intriguing that PKM2, in addition to sensing serine”, is inactivated by 
oxidative stress’’. Such inactivation should increase 3-phosphoglycerate 
and thus potentially serine-driven NADPH production. 

In addition to synthesizing serine, rapidly growing cells avidly con- 
sume glycine”. Intriguingly, although only intact glycine (and not glycine- 
derived one-carbon units) is incorporated into purines, knockdown of 
the glycine cleavage system impairs cancer growth”. We find that a 


majority of glycine-derived one-carbon units are fully oxidized, argu- 
ing against the glycine cleavage system’s primary role, at least in the 
tested cell lines, being to release one-carbon units to the cytosol. Instead, 
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Figure 4 | Comparison of NADPH production and consumption. a, Main 
NADPH consumption pathways. b, NADPH production and consumption 
fluxes. Mean = s.d., with error bar showing the variation of total production or 
consumption, n = 3. 
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its function may be simultaneous elimination of unwanted glycine and 
production of mitochondrial NADPH. 

Understanding NADPH’s production and consumption routes is 
essential to a global understanding of metabolism. The approaches pro- 
vided here will enable evaluation of these routes in different cell types and 
environmental conditions. Analogous measurements for ATP, achieved 
first more than a half century ago”, have formed the foundation for 
much of subsequent metabolism research. Given NADPH’s compar- 
able significance in medically important processes including lipogen- 
esis, oxidative stress, and tumour growth”®, quantitative analysis of its 
metabolism may prove of similar importance. 


METHODS SUMMARY 


Cells were grown in Dulbecco’s modified eagle media (DMEM) without pyruvate 
(CELLGRO) with 10% dialysed fetal bovine serum (Invitrogen) in 5% CO} at 37 °C 
and harvested at ~80% confluency. Stable knockdown cell lines were generated by 
shRNA-expressing lentivirus with puromycin selection. [DH1, IDH2 and ALDH1L2 
knockdown was generated by transfecting cells with siRNA. For confirmation of 
knockdown, see Extended Data Fig. 10. For metabolite measurements, metabolism 
was quenched and metabolites extracted by aspirating media and immediately add- 
ing —80 °C 80:20 methanol:water. Supernatants from two rounds of extraction were 
combined, dried under Np, resuspended in water, placed in a 4 °C autosampler, and 
analysed within 6 h by reversed-phase ion-pairing chromatography negative-mode 
electrospray-ionization high-resolution MS on a stand-alone orbitrap (Thermo)*. 
Fluxes from '*C-labelled substrates to CO, were measured by adding trace “C-labelled 
nutrient to normal culture media, quantifying radioactive CO) release, and correct- 
ing for intracellular substrate labelling according to percentage of radioactive tracer 
in the media and fraction of particular intracellular metabolite deriving from media 
uptake, as measured using ®C-tracer. To assess the potential contribution of vari- 
ous metabolic pathways to NADPH production, we analysed feasible steady-state 
fluxes ofa genome-scale human metabolic network model” constrained by experi- 
mentally measured uptake and excretion fluxes and growth rate of the iBMK cell 
line. The flux balance equations were solved in MATLAB with the objective func- 
tion formulated to minimize the total sum of fluxes'*. NADPH consumption by reduc- 
tive biosynthesis was determined based on reaction stoichiometries, experimentally 
measured cellular biomass composition, growth rate, fractional de novo synthesis 
of fatty acids (by “*C-labelling from [U-'*C] glucose and [U-'*C]glutamine), and 
fractional synthesis of proline from glutamate versus arginine (by ‘*C-labelling 
from [U-°C] glutamine). Correction for the deuterium kinetic isotope effect was 
based on the assumption that total metabolic fluxes are not impacted. Let x be the 
fractional labelling of the relevant substrate hydrogen, Fy be the NADPH pro- 
duction flux from unlabelled substrate and F;, be the NADPH production flux from 
the labelled substrate. 


x 
F, __(Vu/Vp) 
Fy 1-x | (2) 
Bosaction = Fi 4+ Fe p al Vp) +x(1—(Vi/Vp)) @) 


x 


F,/x is the flux in cases without a discernible kinetic isotope effect (for example, for 
13C). The remaining term is the correction factor for the kinetic isotope effect: 


V; V 
=—4x(1-—4 (4) 
Vp Vp 
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Methods 
Cell lines and culture conditions. HEK293T (large T antigen-transformed human 
embryonic kidney cells) and MDA-MB-468 (triple-negative human breast cancer cells) 
were purchased from ATCC. Immortalized baby mouse kidney epithelial cells ((BMK) 
with and without myr-AKT were a gift of Eileen White’*”’. All cell lines were grown in 
Dulbecco’s modified eagle medium (DMEM) without pyruvate (CELLGRO), supple- 
mented with 10% dialysed fetal bovine serum (Invitrogen) in a 5% CO), incubator at 37 °C. 
Knockdown of enzymes were by infection with lentivirus expressing the cor- 
responding shRNA: shMTHFD1,#1:CCGGGCTGAAGAGATTGGGATCAAACT 
CGAGTTTGATCCCAATCTCTTCAGCTTTTTG,#2:CCGGGCCATTGATGCTC 
GGATATTTCTCGAGAAATATCCGAGCATCAATGGCTTTTTG; shMTHFD2,#1: 
CCGGGCAGTTGAAGAAACATACAATCTCGAGATTGTATGTTTCTTCAA 
CTGCTTTTTG, #2:CCGGGCTGGGTATATCACTCCAGTTCTCGAGAACTG 
GAGTGATATACCCAGCTTTTTG; shG6PD,#1:CCGGCAACAGATACAAGA 
ACGTGAACTCGAGTTCACGTTCTTGTATCTGTTGTTTTTG, #3:CCGGGC 
TGATGAAGAGAGTGGGTTTCTCGAGAAACCCACTCTCTTCATCAGCTTT 
TTG; shNNT:CCGGCCCTATGGTTAATCCAACATTCTCGAGAATGTTGGA 
TTAACCATAGGGTTTTTG; shME1,#1:CCGGGCCTTCAATGAACGGCCTA 
TTCTCGAGAATAGGCCGTTCATTGAAGGCTTTTTG, #2:CCGGCCAACAA 
TATAGTTTGGTGTTCTCGAGAACACCAAACTATATTGTTGGTTTTTG 
and puromycin selection. To obtain the shRNA-expressing virus, pLKO-shRNA 
vectors (Sigma-Aldrich) were cotransfected with the third generation lentivirus pack- 
aging plasmids (pMDLg, pCMV-VSV-G and pRsv-Rev) into HEK293T cells using 
FuGENE 6 Transfection Reagent (Promega), fresh media added after 24h, and 
viral supernatants collected at 48 h. Target cells were infected by viral supernatant 
(diluted 1:1 with DMEM; 6 pg ml! polybrene), fresh DMEM added after 24h, and 
selection with 3 ug ml’ puromycin initiated at 48 h and allowed to proceed for 2 
to 3 days. Thereafter, cells were maintained in DMEM with 1 pg ml’ puromycin. 
For IDH1, IDH2 and ALDH1L2 knockdown, siRNA targeting IDH1 or IDH2 (Thermo 
Scientific, 40 nM) or ALDH1L2 (Santa Cruz, 30 nM) were transfected into H293T 
cells using Lipofectamine RNAiMAX (Invitrogen). Knockdown of the enzymes 
was confirmed by immunobloting with commercial antibodies: G6PD (Bethyl Labo- 
ratories), MTHFD1 and MTHED2 (Abgent), IDH1 (Proteintech Group), IDH2 
(Abcam) and ALDH1L2 (Santa Cruz) or quantitative RT-PCR probes (MEI and 
NNT, Applied Biosystems) (Extended Data Fig. 10). For enzymes with more than 
one successful knockdown sequence, data presented here are mean + s.d. of inde- 
pendent experiments using different shRNA sequences. 
Measurement of metabolite concentrations and labelling patterns. Cells were col- 
lected at 80% confluency. For metabolomic experiments, medium was replaced every 
2 days and additionally 2 h before metabolome collection and/or isotope tracer addi- 
tion. Metabolism was quenched and metabolites extracted by aspirating media and 
immediately adding 80:20 methanol:water at — 80 °C. Supernatants from two rounds 
of methanol:water extraction were combined, dried under N,, resuspended in HPLC 
water, placed in a 4 °C autosampler, and analysed within 6 h to avoid NADPH degradation. 
The LC-MS method involved reversed-phase ion-pairing chromatography 
coupled by negative mode electrospray ionization to a stand-alone orbitrap mass 
spectrometer (Thermo Scientific) scanning from m/z 85-1,000 at 1 Hz at 100,000 
resolution***’ with LC separation on a Synergy Hydro-RP column (100 mm X 2 mm, 
2.5 jum particle size, Phenomenex, Torrance, CA) using a gradient of solvent A 
(97%:3% H,0:MeOH with 10 mM tributylamine and 15 mM acetic acid), and 
solvent B (100% MeOH). The gradient was 0 min, 0% B; 2.5 min, 0% B; 5 min, 20% 
B; 7.5 min, 20% B; 13 min, 55% B; 15.5 min, 95% B; 18.5 min, 95% B; 19 min, 0% B; 
25 min, 0% B. Injection volume was 10 111, flow rate 200 ul min 1 andcolumn tem- 
perature 25 °C. Data were analysed using the MAVEN software suite**. Data from 
C-labelling experiments were adjusted for natural '*C abundance and impurity 
of labelled substrate; those from 7H-labelling were not adjusted (natural 2H abun- 
dance is negligible)**. The absolute concentration of 6-phosphogluconate was quan- 
tified by comparing the signal of '*C-labelled intracellular compound (from feeding 
[U-'*C] glucose) to the signal of unlabelled internal standard. 
Fractional labelling of NADPH redox active site. The fractional NADPH redox 
active site labelling (x) was measured from the observed NADPH and NADP* label- 
ling patterns from the same sample. We calculated x to best fit the steady-state mass 
distribution vectors of NADPH and NADP* (Myappy and Mapp’) by least square 
fitting in MATLAB (function: Isqcurvefit). 


mo | M+0 
m | M+1 
Mnapp+ = | 2 | M+2 
my | M+N 
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Network analysis of potential NADPH producing pathways. To assess the poten- 
tial contribution of various metabolic pathways to NADPH production, we ana- 
lysed feasible steady-state fluxes of a genome-scale human metabolic network model”. 
The glucose (98 nmol/(,il X h)), glutamine (40 nmol/(l X h)), and oxygen uptake 
rates (21 nmol/(l X h)); and lactate (143 nmol/(l X h)), alanine (2 nmol/(tl X h)), 
pyruvate (15 nmol/(l X h)), and formate (< 0.25 nmol/(l X h)) excretion rates 
were set to experimental measured fluxes in the iBMK cell line, as measured by a 
combination of electrochemistry (glucose, glutamine, lactate on YSI7200 instru- 
ment, YSI, Yellow Springs, OH), LC-MS (alanine, pyruvate with isotopic internal 
standards), fluorometry (oxygen on XF24 flux analyser, Seahorse Bioscience, North 
Billerica, MA), and NMR (formate by 'H 500 MHz, Bruker, 10 uM limit of detec- 
tion). The uptake of amino acids from DMEM media were bounded to not more 
than a third of that of glutamine, which is a loose constraint relative to experimental 
observations in iBMK cells and in NCI-60 cells**. Biomass requirements were based 
on the experimentally determined growth rate of the iBMK cell-line with protein, 
fatty acids and nucleotides accounting for 60%, 10% and 10% of the total cellular dry 
mass, respectively, based on experimental measurements. Steady-state intracellular 
fluxes that best fit these experimental constraints were then selected by solving the 
flux balance equations in MATLAB with the objective function formulated to mini- 
mize the sum of total fluxes’. 

Correction for deuterium’s kinetic isotope effect. The kinetic isotope effect (Viz/ 
Vp) for isolated NADPH producing enzymes ranges from 1.8 to 4, with isolated 
G6PD and 6-phosphogluconate dehydrogenase having Vy/Vp = 1.8 (refs 10, 11). 
However, cellular homeostatic mechanisms (including flux control being distrib- 
uted across multiple pathway enzymes) may result in a lesser impact on labelling 
patterns in cells. 

The smallest reasonable correction for the deuterium kinetic isotope effect is based 
on the assumption that total metabolic fluxes are not affected. This correction was 
used as the default in this work. Let x be the fractional labelling of the relevant 
substrate hydrogen, Fy be the NADPH production flux from unlabelled substrate 
and F,, be the NADPH production flux from the labelled substrate. 


x 
A Vu/Vp 
Fy 1—-x (2) 
U —x 
Vu/V1 1—(Vu/V, 
Freaction Fi + Fu FS u/ pb) +x( ( u/ D)) (3) 


x 


F,/x is the flux in cases without a discernible kinetic isotope effect (for example, for 
°C). The remaining term is the correction factor for the kinetic isotope effect: 


(9 ° 


The largest reasonable correction for the deuterium kinetic isotope effect is based 
on the assumption that pathway flux is decreased by the introduction of *H-labelled 
tracer equivalent to the decrease in activity of the associated enzyme observed 
in vitro: 


Vu/Vp 
1+N x ((Vu/Vp)—1) x Xnapeu 


(6) 


Cte 


in which N is the number of NADPH produced per substrate molecule passing 
through the pathway. For the oxidative pentose phosphate pathway, N = 2. Note 
that the effect of the kinetic isotope effect on ["H]NADPH production may be partially 
offset by an analogous (albeit smaller) kinetic isotope effect in [> H]JNADPH con- 
suming reactions. Vj4/Vp for fatty acid synthetase is ~1.1 (ref. 36). The effect of 
different mechanisms of correcting for the deuterium kinetic isotope is shown in 
Extended Data Fig. 1. 

Quantifying absolute oxidative pentose phosphate pathway flux based on 6- 
phosphogluconate labelling kinetics. To quantify the absolute oxidative pentose 
phosphate pathway flux, cells were switched to media containing [U-’*C] glucose, 
and the kinetics glucose-6-phosphate and 6-phosphogluconate labelling were mea- 
sured. The unlabelled fraction of 6-phosphoglucanate decays with time as: 
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d[6-phosphogluconate] [6-phosphogluconate 


Fy 
dt oa [6-phosphogluconate]'*™ (7) 


+4 Foxppp x Fraction ™2belledGeP (t) 

where F oxidative pentose phosphate pathway (Foxppp) is the flux of oxidative pentose phos- 
phate pathway, [6-phosphogluconate]'*" is the total cellular 6-phophogluconate 
concentration, which was directly measured, and Fraction®™>"edG°P (4) is the unla- 
belled fraction of glucose-6-phosphate at time t, which decays exponentially. F.xppp 
was obtained by least square fitting as per ref. 37. 

Quantifying the upper limit of NADPH production via malic enzyme by °C 
labelling. Malic enzyme (ME) can produce either NADH or NADPH. Thus, total 
malic enzyme flux puts an upper limit on the associated NADPH production. To probe 
overall malic enzyme activity, cells were incubated with (u-8c] glutamine for 48 h, 
which resulted in the majority of intracellular malate being uniformly labelled ("*C,), 
with a small portion being '*C;. For simplicity, we assume that '*C3-malate is an 
equal mix of [1,2,3-!°C3]malate and [2,3,4-'°C3]malate owing to rapid interconve- 
rsion with fumarate (which is symmetric). Malic enzyme produces [1*C3]pyruvate 
from both [1?C,]malate and [1,2,3-!?C;]malate, whereas glycolysis produces unla- 
belled pyruvate (See Extended Data Fig. 4). 


Total malate 
[°C4]Malate + 0.5[/%C3]Malate (8) 


['°C3]Pyruvate 


Flux; < x 
NADPH ME ~ Total pyruvate 


x Fi TUX glycolysis 


Estimation of fractional contribution of MTHFD to NADPH production based 
on 7H-serine labelling. Similar to quantifying relative contribution of oxidative 
pentose phosphate pathway to cytosolic NADPH production, the contribution of 
THE-pathway can be estimated from [*H]serine labelling as follows: 
?HINADPH 
Total [*H|NADPH 


Total methylene-THF 
?H]methylene-THF (9) 


FractionNappH THE-pathway 


Existing methods do not allow direct measurement of methylene-THF labelling, 
but such labelling can be approximated based on intracellular serine labelling (for- 
mally, the [*H]serine labelling places an upper bound on [*H]methylene-THF labelling). 
? H|NADPH Total serine 
Total [*H]NADPH 
x Cx(MTHED) 


MTHEDI has a deuterium kinetic isotope effect Viq/Vp of ~ 3. 

Measurement of ‘CO, release. Radioactive CO; released by cells from position- 
ally labelled substrates was measured by trapping the CO) in filter paper saturated 
with 10 M KOH as previously described". Cells were grown in tissue culture flasks 
with DMEM medium with less than normal bicarbonate (0.74 g per 1) and addition 
of HEPES buffer (6 g per |, pH 7.4). At the beginning of experiment, trace amount 
of desired !*C-labelled tracer was added to the media. For each cell line, the amount 
was selected to be the minimum that gives a sufficient radioactive CO, signal to 
quantitate accurately (~1 j.Ci ml” '). All knockdown lines were treated identically 
to their corresponding parental line. Then the flask was sealed with a rubber stopper 
with a central well (Kimble Chase) containing a piece of filter paper saturated with 
10 M KOH solution. The flasks were incubated at 37 °C for 24h. CO) released by 
cells was absorbed by the base (that is, KOH) in the central well. Metabolism was 
stopped by injection of 1 ml 3 M acetic acid solution through the rubber stopper. 
The flasks were then incubated at room temperature for 1h to ensure all the CO, 
dissolved in media was released and absorbed into the central well. The filter paper 
and all the liquid in central well was transfer to a scintillation vial containing 15 ml 
liquid scintillation cocktail (PerkinElmer). The central well was washed with 100 pl 
of water twice, and the water was added to the same scintillation vial. Radioactivity 
was measured by liquid scintillation counting. In parallel, the same experiments 
were performed using [U-'*C]-labelled nutrient (in amounts that fully replaced the 
unlabelled nutrient in DMEM) and the extent of labelling of the intracellular metab- 
olite, that is the substrate of the CO,-releasing reaction, was measured by LC-MS. 
Absolute CO, release rates from the nutrients of interest were calculated as follows: 


Fractionyappy THE-patl = 
en [?H]serine 


(10) 


Rateco, from C-labelled tracer; [wCi/h/ pl cells] 
Overall media tracer; activity|Ci/nmol] 


1 (11) 


F ractiON intracellular compound, from media 
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Fractional labelling of cytosolic formyl groups from [U-"*C]serine. Cells were 
cultured with media containing [U-C]serine for 48 h, washed three times with 
cold PBS to remove extracellular serine, extracted, and the intracellular labelling 
pattern analysed by LC-MS for ATP (representing purines; there is no labelling of 
ribose-phosphate based on LC-MS measurements), glycine and serine. The purine 
ring has 5 carbons: 1 from CO,, 2 from glycine and 2 from formyl groups (from 10- 
formyl-THF). Assuming that CO, labelling is negligible, which is realistic for cells 
grown in a 5% CO) incubator, let X4 rp; and Xqyj represent the experimentally 
observed fraction ATP and glycine with i and j labelled carbons. The cytosolic 10- 
formyl-THF labelling fraction, y, was fit by least squares 


Xatp-o =Xely-0 X (1 —yy 
Xarp-1=2 X Xqyy-0 XW —y) 
(12) 


Xarp-2 =Xay-2 x 1—y)’ + Xey-0 X 
Xarp-3=2 X Xqy-2 X W(1—y) 


2 
Xarp-4 = XGly-2 XY 


Cytosolic NADPH production from 10-formyl-THF pathway. Cytosolic NADPH 
production from 10-formyl-THF pathway was quantified by tracking its end pro- 
ducts: 10-formyl-THF consumed by purine synthesis and CO. (Formate excretion 
into media is below the detection limit of NMR.) All 10-formyl-THF consumed by 
purine synthesis is generated in cytosol and associated with production of 1 NADPH. 
For each CO, released from serine C3, if the reaction happens in cytosol, 1 NADPH 
is produced from 10-formyl-THF oxidation, and another NADPH is produced via 
MTHEDI1. Total cytosolic NADPH production via 10-formyl-THF pathway is: 


(13) 
If complete oxidation of serine C3 instead happens in mitochondria, there is no 
cytosolic NADPH production associated with CO, released from serine C3 (that is, 
no red bar in Fig. 3d). Instead, one mitochondrial NADPH is produced from 10- 
formyl-THF oxidation, and zero to one other mitochondrial NADPH from 5, 
10-methylene-THF oxidation depending on the enzyme used to catalyse the reac- 
tion and its cofactor specificity. In mitochondria, this reaction can be catalysed by 
MTHED2, which (at least in the presence of high phosphate in vitro) preferentially 
uses NAD* or by MTHFD2L, which uses NADP*. 

ROS measurement, cell proliferation and cell death assay. ROS measurement 
followed published protocols”. Briefly, HEK293T cells were incubated with 5 1M 
CM-H2DCEDA (Invitrogen) for 30 min. Cells were trypsinized, and mean FL] fluo- 
rescence was measured by flow cytometry. Cell proliferation was measured by tryp- 
sinizing cells and counting using a Beckman’s Multisizer 4 Coulter Counter. To 
measure cell death, cells were stained with Trypan Blue. Stained and unstained cells 
were counted and cell death percentages tabulated. 

Quantification of NADPH consumption by reductive biosynthesis. The gen- 
eral strategy for measuring consumption fluxes was as follows: (1) identifying the 
biomass components produced in cells grown in DMEM by NADPH-driven re- 
ductive biosynthesis (these are DNA, proline and fatty acids); (2) determining the 
biomass fraction of each component in each cell line; (3) quantifying the cellular 
growth rate Regrowth = In(2)/ty/2; (4) measuring the fractional contribution of dif- 
ferent biosynthetic routes to each biomass component via experiments with ['°C]- 
labelled glucose and/or glutamine and LC-MS analysis; (5) computing the average 
number of NADPH per unit of biomass component, which equals the sum of the 
fractional contribution of each route multiplied by the number of NADPH con- 
sumed by that route; and (6) determining NADPH consumption as follows: 


Fluxyappu from THF pathway = 2x Fluxpurine synthesis +2x Fluxco; from serine C3 


C doit Product abundance 7 
consumption flux < 
P Cell volume growth 


(14) 
x (Average NADPH per product) 


The required data were acquired as follows below. 

DNA. Cellular DNA and RNA were extracted and separated with TRIzol reagent 
(Invitrogen), purified and quantified by Nanodrop spectrophotometer. 

Fatty acids. Total cellular lipid was extracted and saponified after addition of 
isotope-labelled internal standards for the C16:0, C16:1, C18:0, and C18:1. Samples 
were analysed by negative ESI-LC-MS with LC separation on a C8 column. Con- 
centrations of other fatty acids, for which isotope-labelled internal standard were 
not available, were measured by comparison to the palmitate internal standard. The 
calculated fatty acid concentrations were multiplied with a correction factor to account 
for incomplete lipid recovery in the first step of the sample preparation procedure. 
This correction factor was empirically determined to be 1.9 by experiments in which 
lipid standards were spiked into extraction solution. 
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The extent of fatty acid synthesis and elongation (both of which consume NADPH) 

was determined by feeding cells [U-'*C] glucose and [U-'*C] glutamine for multiple 
doublings to achieve pseudo-steady state labelling of their lipid pools. Fatty acid 
labelling patterns were measured and computationally simulated to quantify the 
fraction of production versus import for each individual fatty acid species. Extended 
Data Fig. 9 shows the associated data for C16:0, C16:1, C18:0, and C18:1, which 
together account for ~ 80% of total cellular fatty acids and >90% of non-essential 
fatty acids (essential fatty acids are imported, not synthesized, and thus do not 
affect NADPH production). NADPH calculations include similar data for all mea- 
surable fatty acids. 
Proline. Proline can be made from either arginine or glutamate. Proline synthesis 
from either substrate requires two high-energy electrons at the step catalysed by 
pyrroline-5-carboxylate reductase, which may use NADH or NADPH (for simplic- 
ity, we assume an equally contribution from each). Proline synthesis from glutamate 
consumes one additional NADPH”. To quantify the fraction of proline synthesized 
from each substrate, cells were labelled with [U-'°C] glutamine to steady state, which 
labels glutamate but not arginine. Labelling of intracellular proline and glutamate 
were measured. 


Fraction proline C-labelled 


XG 15 
ch Fraction glutamate }3 C-labelled i) 
Growth rate x Protein content 
F luxNADPH for proline ; A 
Average formula weight per residue (16) 


x Proline frequency x (1.5X¢iu + 0.5(1 — Xiu) 
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Proline synthesis enzymes are present in both the cytosol and mitochondria. For 
simplicity, Fig. 4 assumes exclusively cytosolic proline synthesis. 
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Extended Data Figure 1 | Probing the fractional contribution of the 
oxidative pentose phosphate pathway to NADPH production with 

[’H] glucose. a, Example of LC-MS chromatogram of M+0 and M+ 1 forms of 
NADPH and NADP‘. Plotted values are 5 p.p.m. mass window around each 
compound. b, Extent of NADPH labelling must be corrected for extent of 
glucose-6-phosphate labelling. Incomplete labelling can occur due to influx 
from glycogen or hydrogen-deuterium exchange. c, Labelling fraction of 
glucose-6-phosphate and fructose-1,6-phosphate in iBMK cells with and 
without activated Akt (20 min after switching into [1-H] glucose). d, Labelling 
fraction of fructose-1,6-phosphate and 6-phosphogluconate after feeding 
[1-H] glucose. Labelling fraction of fructose-1,6-phosphate reflects the 
labelling of glucose-6-phosphate, whose peak after addition of the [*H]glucose 
was not sufficiently resolved from other LC-MS peaks in HEK293T and MDA- 
MB-468 cells to allow precise quantification of its labelling directly. The 
difference in the labelling fraction between glucose-6-phosphate and 
6-phosphogluconate reflects the fraction of deuterium labelling specifically at 
position 1 of glucose-6-phosphate. e, Due to the kinetic isotope effect, feeding of 


MDA-MB 
~468 


14 = Cas Eqn. (6) 


iBMK- iBMK- 
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deuterium tracer can potentially alter pathway fluxes. To assess whether the 
feeding of [1-?H] glucose creates a bottleneck in the oxidative pentose 
phosphate pathway, we measured the relative concentration of oxidative 
pentose phosphate pathway intermediates with or without feeding of 

[1-"H] glucose. No significant changes were observed. f, Effect of different 
mechanisms of correcting for the deuterium kinetic isotope effect on fractional 
contribution of oxidative pentose phosphate pathway to NADPH production. 
g, Effect of different mechanisms of correcting for the deuterium kinetic isotope 
effect on calculated total NADPH production rate. The correction mechanisms 
are: (1) no kinetic isotope effect (Cgr: = 1), (2) no effect on total pathway flux 
but preferential utilization of 'H over *H-labelled substrate (equation (4) of 
main text) (the smallest reasonable correction, and the one applied in the main 
text), or (3) full kinetic isotope effect observed for the isolate enzyme with 
associated decrease in total pathway flux (Eqn. 6 of Methods) (the largest 
reasonable correction). All results are mean + s.d., n = 2 biological replicates 
from a single experiment and results were confirmed in multiple experiments. 
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Extended Data Figure 2 | Two independent measurement methods give 
consistent oxidative pentose phosphate pathway fluxes. a, Diagram of 
[1-'*C] glucose and [6-'*C]glucose metabolism through glycolysis and the 
oxidative pentose phosphate pathway. The oxidative pentose phosphate 
pathway specifically releases glucose C1 as CO3, whereas all other CO3- 
releasing reactions are downstream of triose phosphate isomerase (TPI). As 
TPI renders C1 and C6 of glucose indistinguishable (both positions become C3 
of glyceraldehyde-3-phosphate), the difference in CO, release from C1 versus 
C6, multiplied by two, gives the absolute rate of NADPH production via 
oxidative pentose phosphate pathway. A potential complication involves 
carbon scrambling via the reactions of the non-oxidative pentose phosphate 
pathway, but this was negligible (see Extended Data Fig. 3). b, Complete carbon 


labelling of glucose-6-phosphate. Glucose-6-phosphate was labelled 
completely (> 99%) within 2h of switching cells into [U-!*C] glucose. c, CO, 
release rate from [1-'4C] glucose and [6-"4C] glucose. d, Pool size of 
6-phosphogluconate. e, Kinetics of glucose-6-phosphate and 
6-phosphogluconate labelling upon switching cells to [U-'*C] glucose. 

f, Overlay upon the 6-phosphogluconate data from e of simulated labelling 
curves based on the flux that best fits the labelling kinetics (blue) (see Methods), 
and the flux from '*CO, release measurements (green). g, Calculated fluxes and 
95% confidence intervals based on kinetics of 6-phosphogluconate labelling 
from [U-'3C-]glucose, compared to radioactive CO, release from 

{1-“c] glucose and [6-4C] glucose. The two approaches give consistent results, 
with the '*CO) release data being more precise. Mean + s.d., n = 3. 
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Extended Data Figure 3 | The extent of carbon scrambling via non- 
oxidative pentose phosphate pathway is insufficient to substantially affect 
oxidative pentose phosphate pathway flux determination using 

[1-'*C] glucose and [6-'*C] glucose, with most carbon entering oxidative 
pentose phosphate pathway directed towards nucleotide synthesis. 

a, Schematic of glycolysis and pentose phosphate pathway showing fate of 
glucose C6. Note that glucose C6 occupies the phosphorylated position (that is, 
the last carbon) in every intermediate. Thus, upon catabolism to pyruvate, 
glucose C6 always becomes pyruvate C3, irrespective of any potential 
scrambling reactions. b, Schematic of glycolysis and pentose phosphate 
pathway showing fate of glucose C1. Glucose C1 can be scrambled via the non- 
oxidative pentose phosphate pathway, moving to C3 (red boxes) or C6 as 
shown here. The forms shown in the green boxes were not experimentally 
observed. As glucose C3 becomes pyruvate C1 (the carboxylic acid carbon of 
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pyruvate), which is selectively released as CO, by pyruvate dehydrogenase, 
scrambling of C1 to C3 can potentially increase CO, release from glucose C1 
relative to C6. This is ruled out in panels d and e. c, Feeding [1-'°C] glucose or 
[6-'*C] glucose results in 50% labelling of 3-phosphoglycerate without any 
double labelling (that is, M+2), as expected in the absence of scrambling. 

d, MS/MS method to analyse positional labelling of 1-labelled pyruvate. 
Collision induced dissociation breaks pyruvate to release the carboxylic acid 
group as COp. If the daughter peak of 1-labelled pyruvate does not contain 
labelled carbon (m/z = 43), the labelling is at the C1 position; otherwise, it is at 
C2 or C3. e, After feeding fi-?e] glucose or [6-°C] glucose, pyruvate is not 
labelled at the C1 position (< 0.5%), ruling out extensive scrambling. 

f, Oxidative pentose phosphate pathway flux is similar to or smaller than ribose 
demand for nucleotide synthesis. Mean + s.d., n = 3. 
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Extended Data Figure 4 | Probing the contribution of alternative NADPH 
producing pathways. a, Pathway diagram showing potential for [2,3,3,4,4-"H] 
glutamine to label NADPH via glutamate dehydrogenase and via malic 
enzyme. Labelled hydrogens are shown in red. b, NADP* and NADPH 
labelling patterns (without correction for natural '*C-abundance) after 48h 
incubation with [2,3,3,4,4-"H]glutamine. The indistinguishable labelling of 
NADP* and NADPH implies lack of NADPH redox active hydrogen labelling. 
c, Pathway diagram showing potential for [2,3,3-"H] aspartate to label NADPH 
via isocitrate dehydrogenase. d, NADP* and NADPH labelling patterns 
(without correction for natural '*C-abundance) after 48 h incubation with 
[2,3,3-"H]aspartate. The indistinguishable labelling of NADP* and 

NADPH implies lack of redox active hydrogen labelling. e, Diagram of 
[2,3,3,4,4-"H] glutamine metabolism through TCA cycle, tracing labelled 


(lac) 468 (lac) parental (pyr) Akt (pyr) 


hydrogen. Hydrogen atoms of lighter shade indicate potential H/D exchange 
with water. f, Malate labelling fraction after cells were supplied with 
[2,3,3,4,4-7H] glutamine for 48h. g, Pathway diagram showing potential for 
[1,2,3-'°C] malate (made by feeding [U-°C] glutamine) to label pyruvate and 
lactate via malic enzyme. h, Extent of malate and pyruvate/lactate '*C-labelling. 
Cells were incubated with [U-'*C]glutamine for 48 h. M+3 pyruvate indicates 
malic enzyme flux, which may generate either NADH or NADPH. Similar 
results were obtained also for M+3 lactate, which was used as a surrogate for 
pyruvate in cases in which lactate was better detected. The corresponding 
maximal possible malic enzyme-driven NADPH production rate ranges, 
depending on the cell line, from < 2 nmol wt h!? (based on the limit of 
detection of M+3 pyruvate) to 6 nmol ult h-!. Mean + s.d., n=2. 
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Extended Data Figure 5 | Computational and experimental evidence for 
THF-dependent NADPH production. a, Predicted contribution of folate 
metabolism to NADPH production based on flux balance analysis, using 
minimization of total flux as the objective function, across different biomass 
compositions. The biomass fraction of cell dry weight consisting of protein, 
nucleic acid and lipid was varied as follows: protein 50-90% with a step size of 
10%; RNA/DNA 3-20% with step size of 1%, and lipids 3-20% with step size of 
1% (considering only those combinations that sum to no more than 100%). 
With this range of physiologically possible biomass compositions, the model 
predicts a median contribution of folate metabolism of 24%. Note that with the 
constraint of experimentally measured biomass composition, yet without 
constraining the uptake rate of amino acids other than glutamine to be = 1/3 of 
the glutamine uptake rate, the contribution of folate pathway to total NADPH 
production is predicted to be 23%. b, Range of feasible flux through NADPH 
producing reactions in Recon] model computed via flux variability analysis 
under the constraint of maximal growth rate. As shown, the model predicts that 
each NADPH producing reaction can theoretically have zero flux, with all 
NADPH production proceeding through alternative pathways. Only reactions 
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whose flux upper bound is greater than zero are shown. Reactions producing 
NADPH via a thermodynamically infeasible futile cycle were manually 
removed. As shown, among all NADPH producing reactions, MTHED has the 
highest flux consistent with maximal growth. c, Pathway diagram showing 
potential for [2,3,3-’H]serine to label NADPH via methylene tetrahydrofolate 
dehydrogenase. d, NADP* and NADPH labelling pattern after 48 h incubation 
with [2,3,3-7H]serine (no glycine present in the media). The greater abundance 
of more heavily labelled forms of NADPH relative to NADP* indicates redox 
active hydrogen labelling. Results are mean + s.d., n = 2 biological replicates 
from a single experiment and were confirmed in n = 2 experiments. Based on 
the data in panel d, the contribution of MTHFD1 to cytosolic NADPH 
production spans a broad range (10-40% of total cytosolic NADPH; the range 
is due to variation across cell lines, experimental noise, and the large KIE“). 
This range includes the flux calculated based on purine biosynthetic rate and 
4CO, release from serine (Fig. 3d). Note that the total contribution of the 
cytosolic folate metabolism to NADPH production can exceed that of 
MTHED1, as 10-formyl-THF dehydrogenase also produces NADPH. 


©2014 Macmillan Publishers Limited. All rights reserved 


a 
ce] : 
@ 60 @ Serine 
2 
BE 50 @ ATP 
£o 40 
7 
o 
af 30 
2a 20 
32 10 
re) 
Ba 0 
M+O M+1 M+2 M43 M+4 
c 8C-serine 
IBREHEEE& 
IBRREEE 
= 7 eee ee 
= IBER RHEE 
8 IBR BEES BL 
S IBR REE Se 
x IBRREEE 
IBREEEE 
IBRREEE 
EEREEBREE: & 
HEK293T | MDA-MB | iBMK- iBMK- 
-468 parental Akt 
e 
90 
= 30 m Serine C3 
S 70 m dTTP 
— 60 
£& 50 
o 40 
g 30 
S 20 
10 


HEK293T MDA-MB iBMK- 


iBMK- 
-468 parental Akt 

Extended Data Figure 6 | One-carbon units used in purine and thymidine 
synthesis are derived from serine. a, Serine and ATP labelling pattern after 
24h incubation of HEK293T cells with [U-!°C]serine. The presence of M+1 to 
M-+4 ATP indicates that serine contributes carbon to purines both through 
glycine and through one-carbon units derived from serine C3. b, Quantitative 
analysis of cytosolic one-carbon unit labelling from measured the intracellular 
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ATP, glycine, and serine labelling reveals that most cytosolic 10-formyl-THF 
assimilated into purines comes from serine. c, [U-'*C]serine labels the methyl 
group that distinguishes dTTP from UTP. d, [U-'*C]glycine does not label 
dTTP. e, The extent of dTTP labelling mirrors the extent of intracellular serine 
labelling. f, Methionine does not label from [U-'*C]glycine. In all experiments, 
cells were grown in [U-'%C]serine or glycine for 48h. Mean + s.d., n = 3. 
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Extended Data Figure 7 | Measurement of CO, release rate from serine and 
glycine by combination of '4C- and '°C-labelling. a, ‘CO, release rate when 
cells are supplied with a medium with a trace amount of [3-'*C]serine, 
(1-'4C]glycine or [2-'*C]glycine. b, Fraction of intracellular serine labelled in 
cells grown in DMEM media containing 0.4 mM [3-'°C]serine in place of 
unlabelled serine. The residual unlabelled serine is presumably from de novo 
synthesis. c, Fraction of intracellular glycine labelled in cells grown in DMEM 
medium containing 0.4mM [U-'C]glycine in place of unlabelled glycine. 


d, CO, release rates from serine C3, glycine C1 or C2. e, Potential alternative 
pathway to metabolize glycine or serine into CO, via pyruvate. f, Pyruvate 
labelling fraction after 48 h labelling with [U-C]serine or [U-13C] glycine. The 
lack of labelling in pyruvate indicates that serine and glycine are not 
metabolized through this pathway. g, Knockdown of MTHFD2 or ALDH1L2 
decreases CO; release from glycine C2. h, Knockdown of ALDH1L2 decreases 
the GSH/GSSG ratio. Mean + s.d., n = 3. 
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Extended Data Figure 8 | In the absence of serine, elevated concentrations —_ concentration of glycine (5 mM instead of 0.4mM). ¢, Relative NADPH/ 

of glycine inhibit cell growth and decrease the NADPH/NADP* ratio. NADP* ratio (normalized to cells grown in DMEM) after culturing HEK293T 
a, Schematic of serine hydroxymethyltransferase reaction. High glycine may cell for 3 days in regular DMEM, DMEM with no serine or DMEM with no 
either inhibit forward flux (product inhibition) or drive reserve flux. b, Relative serine and 12.5-times the normal concentration of glycine. d, e, Labelling of 
cell number after culturing HEK293T cells for 3 days in regular DMEM, serine and glycine after feeding [U-'*C]serine or [U-'*C]glycine reveals reverse 
DMEM with no serine or DMEM with no serine and 12.5-times the normal _ serine hydroxymethyltransferase flux. Mean + s.d., n = 3. 
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Extended Data Figure 9 | Quantitative analysis of NADPH consumption 
for biomass production and antioxidant defence. a, Cell doubling times, 
which are inversely proportional to biomass production rates. b, Cellular 
protein content. c, Cellular fatty acid content (from saponification of total 
cellular lipid). d, Quantification of fatty acid synthesis versus import, with 
synthesis but not import requiring NADPH. HEK293T cells were cultured in 
(u-3c] glucose and [U-8C] glutamine until pseudo-steady state, and fatty acids 
saponified from total cellular lipids and their labelling patterns measured 
(green bars), and production versus import of each fatty acid was stimulated 
based on this experimental data. The fractional contribution of each route was 
determined by least square fitting, with the theoretical labelling pattern based 
on the elucidated routes shown (pink bars). Similar data were obtained also for 
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control 150 uM H.0, control 150 uM H,0, 
MD-MBA-468, iBMK-parental, and iBMK-Akt cells (not shown) and used to 
calculate associated NADPH consumption by fatty acid synthesis. e, Cellular 
DNA and RNA contents. f; NADPH consumption by de novo DNA synthesis. 
g, Proline and glutamate labelling patterns after 24h in [U-'*C]glutamine 
media, which was used to quantitate different proline synthesis routes and 
associated NADPH consumption. h, Quantitative analysis of cytosolic NADPH 
consumption in normally growing HEK293T cells (control) and non-growing 
cell under oxidative stress (150 1M H,O,, 5h). Total cytosolic NADPH 
turnover was measured based on the absolute oxidative pentose phosphate 
pathway flux divided by the fractional contribution of the oxidative pentose 
phosphate pathway to total NADPH as measured using ["H]NADPH 


formation from [1-H] glucose. Mean = s.d., n = 3. 
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western blot or qPCR. a, Western blot for G6PD knockdown. b, Western blot of HEK293T with stable knockdown of indicated genes (results for different 
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COLUMN 
A test that fails 


A standard test for admission to graduate school misses 
potential winners, say Casey Miller and Keivan Stassun. 
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niversities in the United States rely 

| too heavily on the graduate record 

examinations (GRE) — a standardized 

test introduced in 1949 that is an admissions 

requirement for most US graduate schools. 

This practice is poor at selecting the most capa- 

ble students and severely restricts the flow of 
women and minorities into the sciences. 

We are not the only ones to reach this con- 
clusion. William Sedlacek, professor emeritus 
of education at the University of Maryland, 
College Park, who has written extensively on 
the issue, notes that studies find only a weak 
correlation between the test and ultimate suc- 
cess in science, technology, engineering and 
maths (STEM) fields. De-emphasizing the GRE 
and augmenting admissions procedures with 
measures of other attributes — such as drive, 
diligence and the willingness to take scientific 
risks — would not only make graduate admis- 
sions more predictive of the ability to do well 
but would also increase diversity in STEM. 


TEST DISPARITIES 

The GRE, like most standardized tests, reflects 
certain demographic characteristics of test-tak- 
ers — such as family socioeconomic status — 
that are unrelated to their intellectual capacity or 
academic preparation. The exam’s ‘quantitative 
score’ — the portion measuring maths acumen, 
which is most commonly scrutinized in admis- 
sions to STEM PhD programmes — correlates 
closely with gender and ethnicity (see “The great 
divide’). The effect is powerful. According to 
data from Educational Testing Service (ETS), 
based in Princeton, New Jersey, the company 
that administers the GRE, women score 80 
points lower on average in the physical sciences 
than do men, and African Americans score 200 
points below white people. In simple terms, the 
GRE is a better indicator of sex and skin colour 
than of ability and ultimate success. 

These correlations and their magnitude are 
not well known to graduate-admissions com- 
mittees, which have a changing rota of faculty 
members. Compounding the problem, some 
admissions committees use minimum GRE 
scores to rapidly filter applications; for exam- 
ple, any candidate scoring below 700 on the 
800-point quantitative test section may be dis- 
carded. Using GRE scores to filter applicants in 
this way is a violation of ETS’s own guidelines. 

This problem is rampant. If the correlation 
between GRE scores and gender and ethnicity 
is not accounted for, imposing such cut-offs 
adversely affects women and minority appli- 
cants. For example, in the physical sciences, > 
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> only 26% of women, compared with 73% 
of men, score above 700 on the GRE Quan- 
titative measure. For minorities, this falls to 
5.2%, compared with 82% for white and Asian 
people. 

The misuse of GRE scores to select appli- 
cants may be a strong driver of the continuing 
under-representation of women and minorities 
in graduate school. Indeed, women earn barely 
20% of US physical-sciences PhDs, and under- 
represented minorities — who account for 33% 
of the US university-age population — earn just 
6%. These percentages are striking in their sim- 
ilarity to the percentage of students who score 
above 700 on the GRE quantitative measure. 

Why is the GRE misused? Admissions 
committees are busy, and numerical rank- 
ings are easy to sort. We believe that faculty 
members also often presume that higher scores 
imply that the test-taker has a greater ability 
to become a PhD-level scientist. Yet research 
by ETS indicates that the predictive validity of 
the GRE tests is limited to first-year graduate- 
course grades, and even that correlation is 
meagre in maths-intensive STEM fields. 

Why should graduate-admissions commit- 
tees care about fixing the problem? First, diver- 
sity, in the form of individuals with different 
perspectives, backgrounds and experiences, is 
a key component of innovation and problem 
solving, a concept that business and industry 
have come to recognize. Less diversity in STEM 
graduate programmes means slower progress 
in tackling today’s scientific and technical chal- 
lenges. Second, the overall PhD completion rate 
in US STEM graduate programmes is a disap- 
pointing 50%. Although graduate programmes 
certainly produce successful students who con- 
tinue on to productive science careers, we think 
that many faculty members would agree that 
such alow PhD completion rate is a poor return 
on the investment in recruiting and training stu- 
dents. Indeed, STEM graduate programmes are 
failing not only from the diversity standpoint, 
but also from a success standpoint. 


ALTERNATIVE SELECTION 

So what should universities do? Instead of fil- 
tering by GRE scores, graduate programmes 
can select applicants on the basis of skills and 
character attributes that are more predictive of 
doing well in scientific research and of ultimate 
employability in the STEM workforce. Apprais- 
ers should look not only at indicators of previ- 
ous achievements, but also at evidence of ability 
to overcome the tribulations of becoming a 
PhD-level scientist. 

A few innovative PhD programmes, includ- 
ing the bridge programmes at the University of 
South Florida in Tampa and Fisk- Vanderbilt in 
Nashville, Tennessee (in which we are involved) 
are doing this. They have achieved completion 
rates above 80%, well above the national aver- 
age, and are greatly boosting participation by 
women and minorities (see Nature 504, 471- 
473; 2013). The admissions process includes an 
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THE GREAT DIVIDE 


The data represent the scores typically achieved in the quantitative reasoning test of the graduate record 
examinations (GRE) by US students from different ethnic groups applying for graduate school. In the physical 
sciences, a minimum score of 700 is required by many PhD programmes. 
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interview that examines college and research 
experiences, key relationships, leadership expe- 
rience, service to community and life goals. The 
result is a good indication of the individual's 
commitment to scientific research and a good 
assessment of traits such as maturity, persever- 
ance, adaptability and conscientiousness atop a 
solid academic foundation. The combination of 
academic aptitude and these other competen- 
cies points to the likelihood of high achievement 
in graduate school and in a STEM career. 

How have the students admitted to these 
courses performed? In the Fisk-Vanderbilt 
programme, 81% of the 67 students who have 
entered the programme — including 56 under- 
represented minorities and 35 women — have 
earned, or are making good progress towards, 
their PhDs. And all students who have com- 
pleted PhDs are employed in the STEM work- 
force as postdocs, university faculty members 
or staff scientists in national labs or industry. 
From the standpoint of optimal outcomes — 
earning a PhD and obtaining employment in 
the STEM workforce — the GRE has proved 
irrelevant. Indeed, 85% of these young scien- 
tists would have been eliminated from con- 
sideration for PhD programmes by a GRE 
quantitative cut-off score of 700. 

The only downside is that interviews take 
about 30 minutes each. But the number of 
interviews need not be large, and the tremen- 
dous insight garnered justifies the time. ETS is 
even marketing a tool for referees to evaluate 
applicants’ personal attributes. The company 
developed it in part as a response to calls from 
applicants and graduate programmes for alter- 
native measures of student potential for long- 
term achievement that is not captured by GRE. 

We often hear admissions committee mem- 
bers say, “We would admit women and minori- 
ties if they were qualified: This mindset reflects 


© 2014 Macmillan Publishers Limited. All rights reserved 


Hispanic American American 


African Men Women 


American 


Puerto 
Rican 


Native 


long-standing admissions practices that sys- 
tematically, ifinadvertently, filter out women 
and minorities. At the same time, these prac- 
tices are no better than a coin flip at identifying 
candidates with the potential — and the mettle 
— to earna PhD. 

Let us be frank: we believe that many STEM 
faculty members on admissions commit- 
tees and upper-level 


“In simple administrators hold 
terms, the a deep-seated and 
GREis a better unfounded belief 
indicator of sex that these test scores 
and skin colour are good measures 
than of ability of ability, of poten- 
and ultimate tial for doing well in 
success.” graduate school and 


of long-term poten- 
tial as a scientist, and that students who score 
poorly on standardized exams are not likely to 
become PhD-level scientists. These assump- 
tions are false. 

This is not a call to admit unqualified stu- 
dents in the name of social good. This is a call 
to acknowledge that the typical weight given 
to GRE scores in admissions is dispropor- 
tionate. If we diminish reliance on GRE and 
instead augment current admissions practices 
with proven markers of achievement, such 
as grit and diligence, we will make our PhD 
programmes more inclusive and will more 
efficiently identify applicants with potential 
for long-term success as researchers. Isn't that 
what graduate school is about? = 


Casey Miller is an associate professor in 

the physics department at the University of 
South Florida in Tampa. Keivan Stassun 

is a professor in physics and astronomy at 
Vanderbilt University and Fisk University in 
Nashville, Tennessee. 
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FACE IN THE DARK 


BY DEBORAH WALKER 


( onglomerate time ticked. In the 
24-hour regime of the habitat, the 
automated lights faded to simulated 

night. The night blinds shuttered into place. 

Throughout the Conglomerate colonies, 

time is unified and indifferent to the rota- 

tion of local suns. 

Dan lay in the darkness counting 
the deep, slow breaths of his wife, 
drifting towards the pull of sleep, 
until... he saw it. He shuddered. 

“What's the matter?” Arrelle whis- 
pered. 

“I thought you were asleep.” 

“No? 

The bed was small, but she seemed 
too far away. Dan slipped closer and 
felt the warmth of her body. 

“Do you see one?” she asked. 

“Tt’s by the window.’ Dan pointed 
to the shadows, at the figure resolved 
out of darkness. The indistinct, small 
shape walked with jerky movements 
like a worn recording made on an 
antiquated device. 

“T see it? said Arrelle, her voice devoid of 
inflection. “The doctor ran the tests again, 
today. He says that the air’s clean. There's no 
trace of hallucinogens.” 

The shadow figure moved a few centi- 
metres. 

“Tm beginning to think that we've all gone 
mad,’ said Arrelle. 

Dan said nothing. 

“Did you hear me, Dan?” 

Dan hesitated before saying: “Father 
McConnell will try another exorcism 
tomorrow.’ 

Arrelle sighed. “It won't do any good.” 

McConnell, the colony’s trans-faith priest, 
had read the Conglomerate Book of Devotion 
blank trying different exorcisms. All human 
things have value. Faith was levelled into a 
matter of equality. A Catholic exorcism was 
equal to a Voodoo prayer chant or to a Vedic 
mantra. This had proved to be the case. Each 
of McConnell’s expulsions had had the same 
result: failure. 

“What do you think it wants, Dan?” 

“I... don't know” Can the dead want any- 
thing? “Perhaps it’s just a restligeist? That 

was the most popular 
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Those who came before. 


or the stones of this planet.” 

“The Stone Tape theory? You think they’re 
alien memories somehow imprinted onto 
the wall. A psychic echo?” asked Arrelle. She 
looked at the shadow creeping over the wall. 
“How can that be true? It doesn’t make any 
sense. The Conglomerate doesn't recognize 
psychic phenomena.’ 


“I know,’ said Dan. “But they’ve run the 
tests. There’s no electromagnetic distur- 
bances, or changes in ionization, or changes 
in the radiation levels. There’s nothing we 
can detect.” 

Arrelle touched Dan’s arm. “Why can’t you 
do something? You work in the labs. Why 
don’t you do something?” 

“Tm a geologist, Arrelle; said Dan. “The 
medical and the physics teams are doing 
everything they can” In fact, it seemed that 
most of the conversations in the mining 
labs were devoted to speculations about the 
night visitors. The labs were lagging behind 
the Conglomerate’s schedules. Dan was a 
supervisor, but he had no idea how he could 
motivate his team. The colony was slowly 
falling apart because of these night shadows. 

Arrelle pulled the sheets around her neck. 
“Tm so cold” 

“Tt’s all in your imagination.” Dan tapped 
a command into his wrist-bracelet. He held 
the small screen towards her, lighting her 
face with flickering chiaroscuro fluores- 
cence. “Look,” he said. “Look. There’s no 
change in temperature.” 

“Then why am I so cold?” 

The shadow figure slowly turned. It raised 
its three arms above its head and slowly 
sketched out an incomprehensible gesture. 

“I read the Conglomerate’s first contact 
protocols again,” said Arrelle. 
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“How can we make first contact with that?” 

“T want to try, Dan. We can't live like this.” 

“And what would you say?” 

“Td say that we were sorry.” 

A small oval of red light formed in the 
shadow’s face. “Its eye is open,’ said Dan. 
“That means that it'll be gone soon.” Its 
red eye hung in the sky of its face like an 
accusation. 

Dan and Arrelle watched as the 
shadow moved along the wall. When 
it reached the corner it faded out of 
sight. 

“IT wish we could leave,” said 
Arrelle. She curled lower into the 
sheets, folding her body, knees tight 
to her chest. 

“You know that we can't,” said 
Dan. It was impossible to leave a 
Conglomerate colony. Once the seed 
ship had deposited its frozen bodies, 
it moved on, following the scout 
ships along the path of Conglomer- 
ate expansion. Conglomerate colo- 
nization was an efficient, automated 
process. The colonists slowly melted 
into re-life, awakening to a colony made safe 
and habitable by the soldier and construc- 
tion auto-drones. 

Dan sighed. He got out of bed, reaching 
for his robe. Whatever the computer said, it 
was cold in this room. He walked over to the 
window and touched a button to iris open 
the blinds. Next time he saw the thing, hed 
keep quiet. Talking about it with Arrelle was 
only making things worse. Whatever the 
shadow was, there was nothing Dan could 
do about it. 

The low red-eye sun, slung in the sky, cast 
long shadows. In the distance, against the 
rose-red light, the towers of the ruined city 
were bone white needles, extending towards 
the horizon. It must have once been mar- 
vellous. Dan saw a discarded Conglomerate 
Humvee buggy. He saw the remains of a sol- 
dier auto-drone. He saw the fields of three 
armed skeletons, bones large and small, 
slowly dissolving in the acid atmosphere. 

“It’s not our fault,” whispered Dan. “We 
didn't know” 

“We never asked,’ said Arrelle. “We just 
accepted the Conglomerate’s offer.” 

Across the horizon the shadows moved, 
remnants of the evicted dead who were 
loathe to leave their home. m 


Find Deborah in the British Museum 
trawling the past for future inspiration. 
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